Nicholas Roberts @nick11roberts X Profile

Nicholas Roberts

@nick11roberts

Followers

1K

Following

2K

Media

75

Statuses

522

Ph.D. student @WisconsinCS. Working on foundation models and breaking past scaling laws. Previously CMU @mldcmu, UCSD @ucsd_cse, FCC @fresnocity. 🤔🤨🧐 e/hmm

https://t.co/o0KBeLySiK

Madison, WI

Joined April 2012

Don't wanna be here? Send us removal request.

Nicholas Roberts

@nick11roberts

13 days

Check out our pre-@COLM_conf blog post on Manticore: creating *pretrained hybrids* from pretrained LMs! 👨‍🌾 🦁 🦂 Headed to Montreal for @COLM_conf next week? Let's nerd out on 📉 scaling laws 📈, 🐍 hybrids 🐺, and 🤖 agents 💻! Blog post + meeting signup below 👇 #COLM2025

1

2

11

Fred Sala

@fredsala

11 hours

The coolest trend for AI is shifting from conversation to action—less talking and more doing. This is also a great opportunity for evals: we need benchmarks that measure utility, including in an economic sense. @terminalbench is my favorite effort of this type!

1

16

27

Radical Numerics

@RadicalNumerics

1 day

Sliding window attention (SWA) is powering frontier hybrid models for efficiency. Is there something better? Introducing Phalanx, a faster and better quality drop-in replacement for sliding window attention (SWA). Phalanx is a new family of hardware and numerics-aware windowed

12

47

186

Michael Poli

@MichaelPoli6

4 days

We just released the largest open-source diffusion language model (RND1). RND1 is important to me on a personal level: it symbolizes our commitment to open-source exploration of radically different designs for AI at scale — training objectives, architectures, domains. There is

Radical Numerics

@RadicalNumerics

6 days

Introducing RND1, the most powerful base diffusion language model (DLM) to date. RND1 (Radical Numerics Diffusion) is an experimental DLM with 30B params (3B active) with a sparse MoE architecture. We are making it open source, releasing weights, training details, and code to

9

39

331

Albert Ge

@albert_ge_95

6 days

🔭 Towards Extending Open dLLMs to 131k Tokens dLLMs behave differently from AutoRegressive models—they lack attention sinks, making long-context extension tricky. A few simple tweaks go a long way!! ✍️blog https://t.co/Epf2y2Lnsk 💻code https://t.co/c04Cj5iT1y

5

50

205

Eric Nguyen

@exnx

6 days

Hello world, we wanted to share an early preview of what we're building at @RadicalNumerics!

Radical Numerics

@RadicalNumerics

6 days

Introducing RND1, the most powerful base diffusion language model (DLM) to date. RND1 (Radical Numerics Diffusion) is an experimental DLM with 30B params (3B active) with a sparse MoE architecture. We are making it open source, releasing weights, training details, and code to

3

11

105

Radical Numerics

@RadicalNumerics

6 days

Introducing RND1, the most powerful base diffusion language model (DLM) to date. RND1 (Radical Numerics Diffusion) is an experimental DLM with 30B params (3B active) with a sparse MoE architecture. We are making it open source, releasing weights, training details, and code to

102

259

1K

Fred Sala

@fredsala

9 days

Super excited to present our new work on hybrid architecture models—getting the best of Transformers and SSMs like Mamba—at #COLM2025! Come chat with @nick11roberts at poster session 2 on Tuesday. Thread below! (1)

2

24

70

Nicholas Roberts

@nick11roberts

13 days

✍️ Blog post: https://t.co/EZM7VaW0BZ 📅 Meeting signup: https://t.co/45OxVLGGf6 🎟️ Poster session: Tuesday 4:30–6:30 — come say hi!

nick11roberts.science

TL;DRManticore automatically builds hybrid language models by combining pretrained components (like Transformers and Mamba) instead of training from scratch....

0

3

Nicholas Roberts

@nick11roberts

14 days

Excited for @COLM_conf next week! I'm taking meetings on scaling laws, hybrid LLMs (Transformer↔SSM/Mamba), agents 🎓I'm also graduating and open to chatting about future opportunities. Grab a slot! https://t.co/45OxVLGGf6 FYI: Tue 4:30–6:30 I’ll be at my poster #COLM2025

Nicholas Roberts

@nick11roberts

3 months

🎉 Excited to share that our paper "Pretrained Hybrids with MAD Skills" was accepted to @COLM_conf 2025! We introduce Manticore - a framework for automatically creating hybrid LMs from pretrained models without training from scratch. 🧵[1/n]

0

14

21

Nicholas Roberts

@nick11roberts

29 days

Just read this new paper from Anthropic’s very own Claude called “A Mathematical Theory of Communication” and my brain is broken 🤯

0

8

Sharon Zhou

@realSharonZhou

1 month

More data ≠ better fine-tuning. Sometimes the 'undertrained' model is more useful because it's still plastic. We need to map the capability vs. adaptability frontier better.

14

19

223

Jack Morris

@jxmnop

2 months

first i thought scaling laws originated in OpenAI (2020) then i thought they came from Baidu (2017) now i am enlightened: Scaling Laws were first explored at Bell Labs (1993)

51

165

2K

Fred Sala

@fredsala

3 months

“Rubrics” have become a buzzword in AI, but the concept predates the hype. At @SnorkelAI, we’re excited to share a fun primer on what rubric‑based evaluation is—and why it’s critical for today’s generative and agentic models.

1

15

33

Tzu-Heng Huang

@zihengh1

3 months

LLM judges are powerful for automated evaluation but expensive and biased.📣 Meet PAJAMA, a new framework that distills LLM judging logic into a compact, executable form (a new representation), cutting costs from thousands to just cents.🚀 We'll present at ICML PRAL on Friday!

1

8

27

Harit Vishwakarma

@harit_v

3 months

Next up this morning at #ICML2025, we will be presenting our work on pseudolabeling-based semi-supervised learning (SSL). East Exhibition Hall A&B # E-1304, 11 am to 1:30 pm Paper: https://t.co/o59Bh44MGU Pseudolabeling-based SSL relies on the model’s confidence scores and

0

6

14

Harit Vishwakarma

@harit_v

3 months

Join us today in the morning poster session at #ICML2025. We will talk about some neat ways for reducing uncertainty and improving LLM accuracy at test-time on multi-choice tasks (e.g., tool selection) using conformal prediction and an additional inference round. 📍 East

0

5

11

Harit Vishwakarma

@harit_v

3 months

Excited to be at ICML’25!!! I'll present papers on improving LLM inference and evaluation and pseudolabeling-based semi-supervised learning. Come and say hi during these sessions, or chat anytime during the week! [C1]. Prune 'n Predict: Optimizing LLM Decision-making with

1

10

24

Fred Sala

@fredsala

3 months

Heading to #ICML! I’ll be representing SprocketLab at @UWMadison and @SnorkelAI. Reach out if you want to chat about data-centric AI, data development, agents, and foundation models.

1

9

39

Nicholas Roberts

@nick11roberts

3 months

📝 Paper: https://t.co/is0QgHBeKx 💻 Code: Soon! Huge thanks to my amazing co-authors @srguo24, @Zhiqi_Gao_2001, @srinath_namburi, @SonNicCr, @ChengjunWu75220, @ChengyuD27, and @fredsala, plus the @COLM_conf reviewers for their feedback! What hybrid would you build? [21/n]

arxiv.org

While Transformers underpin modern large language models (LMs), there is a growing list of alternative architectures with new capabilities, promises, and tradeoffs. This makes choosing the right...

0

3

6