Pranjal Aggarwal ✈️ COLM 🍁
@PranjalAggarw16
Followers
503
Following
1K
Media
39
Statuses
88
PhD Student @LTIatCMU. research scientist intern @AIatMeta FAIR. Working on reasoning, computer-use agents and test-time compute. Prev @IITD
Joined August 2020
What if you could control how long a reasoning model “thinks”? Presenting L1-1.5B, an RL-trained reasoning model with: - controllable thinking length via a prompt - better performance per token than S1 - better short CoT performance than GPT-4o https://t.co/h6sTfywity 🧵
9
69
334
I will be at #COLM2025 this week. Reach out if you want to chat about LLM reasoning, computer-use agents, RL training, or actually anything! (DMs are open) I will also be presenting L1 (Controlling Reasoning Length through RL) tomorrow!
0
1
7
Thomas Zhu (@hanwen_zhu) from the L3 Lab was named a Siebel Scholar! He works on automating mathematical discovery, with a focus on integrating machine learning and the Lean proof assistant. Congrats!!
3
2
43
Thanks to @omarsar0 for highlighting our work! 🎉 DualDistill has been accepted to EMNLP 2025! While training LLMs to use tools isn’t entirely new, we managed to train a 7B model on just 4×A6000 GPUs that dynamically leverages tools for advanced reasoning. https://t.co/UyC0YNqIfR
Agentic-R1 This 7B model is surprisingly good at interleaved tool use and reasoning capabilities. It's fun to see small language models improving this fast. Knowledge distillation in full display. Here are my notes:
0
2
14
🌀New Test-time scaling method 🌀 📝: https://t.co/yqWvOMZpwq - Use RL to train an LLM solution aggregator – Reasons, reviews, reconciles, and synthesizes a final solution -> Much better than existing techniques! - Simple new method. Strong results across 4 math benchmarks. 🧵1/5
2
117
706
Eight CMU Ph.D. students received the @SoftBank–@Arm Fellowship to support research at the intersection of AI and human collaboration. 👏 The program builds on CMU’s relationship with Keio University, part of a $110 million effort to advance AI research. https://t.co/h5XRcR01am
cmu.edu
Eight Carnegie Mellon University Ph.D. students have received the SoftBank Group–Arm Fellowship to support research at the intersection of artificial intelligence and human collaboration.
2
7
30
Got a new efficient/optimally-thinking LLM? Does you model answer simple queries quickly and spends compute on the harder ones? Test it on our new benchmark, OptimalThinkingBench! 👇 Work led by the amazing @PranjalAggarw16 during this internship!
🤖Introducing OptimalThinkingBench 🤖 📝: https://t.co/aufQVJp8aC - Thinking LLMs use a lot of tokens & overthink; non-thinking LLMs underthink & underperform. - We introduce a benchmark which scores models in the quest to find the best mix. - OptimalThinkingBench reports the F1
0
10
79
🤖Introducing OptimalThinkingBench 🤖 📝: https://t.co/aufQVJp8aC - Thinking LLMs use a lot of tokens & overthink; non-thinking LLMs underthink & underperform. - We introduce a benchmark which scores models in the quest to find the best mix. - OptimalThinkingBench reports the F1
1
72
417
Excited about CMU's new Institute for Computer-Aided Reasoning in Mathematics (ICARM), a new NSF Mathematical Sciences Research Institute. I'm honored to serve as an Assistant Director focusing on machine learning and mathematics.
A new federally funded national institute at CMU will help mathematicians use AI to make mathematical reasoning faster and more reliable in solving pressing challenges across science, security and the economy. Read more, and scroll for further details:
8
23
173
Can LLMs self-improve on code generation? Check out our work AlphaVerus where model generates provably correct code and self-improves without any weight updates! At #ICML2025 today: 📆: 11:00 AM - 1:30 PM 📷: Poster #East-2912 https://t.co/53AIFOaEBY w/ Bryan, @wellecks
0
10
57
I will be at #ICML2025 this week. Reach out if you want to chat about llm reasoning, computer-use agents, code gen or actually anything! (DMs are open) I will also be presenting AlphaVerus (self-improving verified code gen) this Thursday!
0
1
15
For such large-scale training, we created Massive-Math-455K by merging all major math datasets with verifiable answers, removing duplicates, synthetic problems, proofs, MCQs, invalid answers, and questions with image links. Dataset Link: https://t.co/a7GflVxFJI w/ @wellecks
huggingface.co
0
0
3
L1-1.5B-Short ouptuts < 512 tokens without special prompting. Due to its efficiency, we were able to RL train it for over 1100 steps (batch size 1024) with GRPO. Despite its small size, it achieves strong performance and runs in real-time on laptops and phones. Model Link:
1
1
4
L1 demonstrated reasoning models can excel at short reasoning lengths, showing self-correction and backtracking, while outperforming larger models at identical token budgets. We took this idea further and trained an exclusive Short-Reasoning Model (SRM): L1-1.5B-Short.
1
1
3
We scaled the L1 training approach to Deepseek-Distilled-Qwen7B and Qwen3-8B. Both models show excellent length control and significantly outperform existing baselines. Model Link: https://t.co/84jempio04 🧵
1
0
3
Super excited to see L1 accepted to #COLM2025! We are further open-sourcing 5 new models & a dataset: 1. L1-7B & L1-8B: Exact and Max variants 2. L1-1.5B-Short: Short reasoning model (SRM), RL-trained on 1.2M data points 3. Massive-Math-455K: A clean, unified math dataset 🧵
What if you could control how long a reasoning model “thinks”? Presenting L1-1.5B, an RL-trained reasoning model with: - controllable thinking length via a prompt - better performance per token than S1 - better short CoT performance than GPT-4o https://t.co/h6sTfywity 🧵
1
3
18
Confused about recent LLM RL results where models improve without any ground-truth signal? We were too. Until we looked at the reported numbers of the Pre-RL models and realized they were serverely underreported across papers. We compiled discrepancies in a blog below🧵👇
33
124
875
AlphaVerus has been accepted at #ICML2025! https://t.co/iLicoq1uj8
https://t.co/xi5kJrbxE9 We've seen in math that good verification (e.g., Lean) unlocks surprising capabilities–why not for code too? AlphaVerus puts LLMs & Rust’s Verus verifier into a self-improving loop–lots
We present AlphaVerus, which enables LLMs to generate provably correct Rust code via a new tree search and self-improvement loop Very excited about AlphaVerus as a starting point for truly trustworthy code generation. Amazing work by @PranjalAggarw16! https://t.co/iLicoq228G
5
9
82
Cool to see our L1 ( https://t.co/KcUXPIZXxO) methodology used here! And a nice insight about using the controllable reasoning budget to enable more efficient use of inference hardware
With INTELLECT-2 we aim for frontier reasoning performance with a controllable thinking budget. By incorporating length rewards into our training run, users can specify how long the model should reason for a given task. https://t.co/K6MctjHcXX
3
10
97
The recent Claude 3.7 model from Anthropic lets you control the budget for thinking—how might this work? Check out L1, our fully open recipe for training reasoning models with controllable thinking budgets!
What if you could control how long a reasoning model “thinks”? Presenting L1-1.5B, an RL-trained reasoning model with: - controllable thinking length via a prompt - better performance per token than S1 - better short CoT performance than GPT-4o https://t.co/h6sTfywity 🧵
4
7
75
We are open-sourcing everything! Including Code, Models and Outputs! Check out our website and paper for more details! Website: https://t.co/h6sTfywity Code: https://t.co/NhGT9Srjdn Paper: https://t.co/khJ7uD74FL Models: https://t.co/84jemphQaw w/ @wellecks
huggingface.co
0
2
16