
Nicholas Roberts
@nick11roberts
Followers
1K
Following
2K
Media
75
Statuses
522
Ph.D. student @WisconsinCS. Working on foundation models and breaking past scaling laws. Previously CMU @mldcmu, UCSD @ucsd_cse, FCC @fresnocity. 🤔🤨🧐 e/hmm
Madison, WI
Joined April 2012
Check out our pre-@COLM_conf blog post on Manticore: creating *pretrained hybrids* from pretrained LMs! 👨🌾 🦁 🦂 Headed to Montreal for @COLM_conf next week? Let's nerd out on 📉 scaling laws 📈, 🐍 hybrids 🐺, and 🤖 agents 💻! Blog post + meeting signup below 👇 #COLM2025
1
2
11
The coolest trend for AI is shifting from conversation to action—less talking and more doing. This is also a great opportunity for evals: we need benchmarks that measure utility, including in an economic sense. @terminalbench is my favorite effort of this type!
1
16
27
Sliding window attention (SWA) is powering frontier hybrid models for efficiency. Is there something better? Introducing Phalanx, a faster and better quality drop-in replacement for sliding window attention (SWA). Phalanx is a new family of hardware and numerics-aware windowed
12
47
186
We just released the largest open-source diffusion language model (RND1). RND1 is important to me on a personal level: it symbolizes our commitment to open-source exploration of radically different designs for AI at scale — training objectives, architectures, domains. There is
Introducing RND1, the most powerful base diffusion language model (DLM) to date. RND1 (Radical Numerics Diffusion) is an experimental DLM with 30B params (3B active) with a sparse MoE architecture. We are making it open source, releasing weights, training details, and code to
9
39
331
🔭 Towards Extending Open dLLMs to 131k Tokens dLLMs behave differently from AutoRegressive models—they lack attention sinks, making long-context extension tricky. A few simple tweaks go a long way!! ✍️blog https://t.co/Epf2y2Lnsk 💻code https://t.co/c04Cj5iT1y
5
50
205
Hello world, we wanted to share an early preview of what we're building at @RadicalNumerics!
Introducing RND1, the most powerful base diffusion language model (DLM) to date. RND1 (Radical Numerics Diffusion) is an experimental DLM with 30B params (3B active) with a sparse MoE architecture. We are making it open source, releasing weights, training details, and code to
3
11
105
Introducing RND1, the most powerful base diffusion language model (DLM) to date. RND1 (Radical Numerics Diffusion) is an experimental DLM with 30B params (3B active) with a sparse MoE architecture. We are making it open source, releasing weights, training details, and code to
102
259
1K
Super excited to present our new work on hybrid architecture models—getting the best of Transformers and SSMs like Mamba—at #COLM2025! Come chat with @nick11roberts at poster session 2 on Tuesday. Thread below! (1)
2
24
70
✍️ Blog post: https://t.co/EZM7VaW0BZ 📅 Meeting signup: https://t.co/45OxVLGGf6 🎟️ Poster session: Tuesday 4:30–6:30 — come say hi!
nick11roberts.science
TL;DRManticore automatically builds hybrid language models by combining pretrained components (like Transformers and Mamba) instead of training from scratch....
0
0
3
Excited for @COLM_conf next week! I'm taking meetings on scaling laws, hybrid LLMs (Transformer↔SSM/Mamba), agents 🎓I'm also graduating and open to chatting about future opportunities. Grab a slot! https://t.co/45OxVLGGf6 FYI: Tue 4:30–6:30 I’ll be at my poster #COLM2025
🎉 Excited to share that our paper "Pretrained Hybrids with MAD Skills" was accepted to @COLM_conf 2025! We introduce Manticore - a framework for automatically creating hybrid LMs from pretrained models without training from scratch. 🧵[1/n]
0
14
21
Just read this new paper from Anthropic’s very own Claude called “A Mathematical Theory of Communication” and my brain is broken 🤯
0
0
8
More data ≠ better fine-tuning. Sometimes the 'undertrained' model is more useful because it's still plastic. We need to map the capability vs. adaptability frontier better.
14
19
223
first i thought scaling laws originated in OpenAI (2020) then i thought they came from Baidu (2017) now i am enlightened: Scaling Laws were first explored at Bell Labs (1993)
51
165
2K
“Rubrics” have become a buzzword in AI, but the concept predates the hype. At @SnorkelAI, we’re excited to share a fun primer on what rubric‑based evaluation is—and why it’s critical for today’s generative and agentic models.
1
15
33
LLM judges are powerful for automated evaluation but expensive and biased.📣 Meet PAJAMA, a new framework that distills LLM judging logic into a compact, executable form (a new representation), cutting costs from thousands to just cents.🚀 We'll present at ICML PRAL on Friday!
1
8
27
Next up this morning at #ICML2025, we will be presenting our work on pseudolabeling-based semi-supervised learning (SSL). East Exhibition Hall A&B # E-1304, 11 am to 1:30 pm Paper: https://t.co/o59Bh44MGU Pseudolabeling-based SSL relies on the model’s confidence scores and
0
6
14
Join us today in the morning poster session at #ICML2025. We will talk about some neat ways for reducing uncertainty and improving LLM accuracy at test-time on multi-choice tasks (e.g., tool selection) using conformal prediction and an additional inference round. 📍 East
0
5
11
Excited to be at ICML’25!!! I'll present papers on improving LLM inference and evaluation and pseudolabeling-based semi-supervised learning. Come and say hi during these sessions, or chat anytime during the week! [C1]. Prune 'n Predict: Optimizing LLM Decision-making with
1
10
24
Heading to #ICML! I’ll be representing SprocketLab at @UWMadison and @SnorkelAI. Reach out if you want to chat about data-centric AI, data development, agents, and foundation models.
1
9
39
📝 Paper: https://t.co/is0QgHBeKx 💻 Code: Soon! Huge thanks to my amazing co-authors @srguo24, @Zhiqi_Gao_2001, @srinath_namburi, @SonNicCr, @ChengjunWu75220, @ChengyuD27, and @fredsala, plus the @COLM_conf reviewers for their feedback! What hybrid would you build? [21/n]
arxiv.org
While Transformers underpin modern large language models (LMs), there is a growing list of alternative architectures with new capabilities, promises, and tradeoffs. This makes choosing the right...
0
3
6