Pranjal Aggarwal ✈️ COLM 🍁 @PranjalAggarw16 X Profile

Pranjal Aggarwal ✈️ COLM 🍁

@PranjalAggarw16

Followers

503

Following

1K

Media

39

Statuses

88

PhD Student @LTIatCMU. research scientist intern @AIatMeta FAIR. Working on reasoning, computer-use agents and test-time compute. Prev @IITD

https://t.co/RIuEy3bMMm

Joined August 2020

Don't wanna be here? Send us removal request.

Pranjal Aggarwal ✈️ COLM 🍁

@PranjalAggarw16

8 months

What if you could control how long a reasoning model “thinks”? Presenting L1-1.5B, an RL-trained reasoning model with: - controllable thinking length via a prompt - better performance per token than S1 - better short CoT performance than GPT-4o https://t.co/h6sTfywity 🧵

9

69

334

Pranjal Aggarwal ✈️ COLM 🍁

@PranjalAggarw16

1 month

I will be at #COLM2025 this week. Reach out if you want to chat about LLM reasoning, computer-use agents, RL training, or actually anything! (DMs are open) I will also be presenting L1 (Controlling Reasoning Length through RL) tomorrow!

0

1

7

Sean Welleck

@wellecks

2 months

Thomas Zhu (@hanwen_zhu) from the L3 Lab was named a Siebel Scholar! He works on automating mathematical discovery, with a focus on integrating machine learning and the Lean proof assistant. Congrats!!

3

2

43

Weihua Du

@StigLidu

2 months

Thanks to @omarsar0 for highlighting our work! 🎉 DualDistill has been accepted to EMNLP 2025! While training LLMs to use tools isn’t entirely new, we managed to train a 7B model on just 4×A6000 GPUs that dynamically leverages tools for advanced reasoning. https://t.co/UyC0YNqIfR

elvis

@omarsar0

4 months

Agentic-R1 This 7B model is surprisingly good at interleaved tool use and reasoning capabilities. It's fun to see small language models improving this fast. Knowledge distillation in full display. Here are my notes:

0

2

14

Jason Weston

@jaseweston

2 months

🌀New Test-time scaling method 🌀 📝: https://t.co/yqWvOMZpwq - Use RL to train an LLM solution aggregator – Reasons, reviews, reconciles, and synthesizes a final solution -> Much better than existing techniques! - Simple new method. Strong results across 4 math benchmarks. 🧵1/5

2

117

706

Carnegie Mellon University

@CarnegieMellon

3 months

Eight CMU Ph.D. students received the @SoftBank–@Arm Fellowship to support research at the intersection of AI and human collaboration. 👏 The program builds on CMU’s relationship with Keio University, part of a $110 million effort to advance AI research. https://t.co/h5XRcR01am

cmu.edu

Eight Carnegie Mellon University Ph.D. students have received the SoftBank Group–Arm Fellowship to support research at the intersection of artificial intelligence and human collaboration.

2

7

30

Swarnadeep Saha

@swarnaNLP

3 months

Got a new efficient/optimally-thinking LLM? Does you model answer simple queries quickly and spends compute on the harder ones? Test it on our new benchmark, OptimalThinkingBench! 👇 Work led by the amazing @PranjalAggarw16 during this internship!

Jason Weston

@jaseweston

3 months

🤖Introducing OptimalThinkingBench 🤖 📝: https://t.co/aufQVJp8aC - Thinking LLMs use a lot of tokens & overthink; non-thinking LLMs underthink & underperform. - We introduce a benchmark which scores models in the quest to find the best mix. - OptimalThinkingBench reports the F1

0

10

79

Jason Weston

@jaseweston

3 months

🤖Introducing OptimalThinkingBench 🤖 📝: https://t.co/aufQVJp8aC - Thinking LLMs use a lot of tokens & overthink; non-thinking LLMs underthink & underperform. - We introduce a benchmark which scores models in the quest to find the best mix. - OptimalThinkingBench reports the F1

1

72

417

Sean Welleck

@wellecks

3 months

Excited about CMU's new Institute for Computer-Aided Reasoning in Mathematics (ICARM), a new NSF Mathematical Sciences Research Institute. I'm honored to serve as an Assistant Director focusing on machine learning and mathematics.

Carnegie Mellon University

@CarnegieMellon

3 months

A new federally funded national institute at CMU will help mathematicians use AI to make mathematical reasoning faster and more reliable in solving pressing challenges across science, security and the economy. Read more, and scroll for further details:

8

23

173

Pranjal Aggarwal ✈️ COLM 🍁

@PranjalAggarw16

4 months

Can LLMs self-improve on code generation? Check out our work AlphaVerus where model generates provably correct code and self-improves without any weight updates! At #ICML2025 today: 📆: 11:00 AM - 1:30 PM 📷: Poster #East-2912 https://t.co/53AIFOaEBY w/ Bryan, @wellecks

0

10

57

Pranjal Aggarwal ✈️ COLM 🍁

@PranjalAggarw16

4 months

I will be at #ICML2025 this week. Reach out if you want to chat about llm reasoning, computer-use agents, code gen or actually anything! (DMs are open) I will also be presenting AlphaVerus (self-improving verified code gen) this Thursday!

0

1

15

Pranjal Aggarwal ✈️ COLM 🍁

@PranjalAggarw16

4 months

For such large-scale training, we created Massive-Math-455K by merging all major math datasets with verifiable answers, removing duplicates, synthetic problems, proofs, MCQs, invalid answers, and questions with image links. Dataset Link: https://t.co/a7GflVxFJI w/ @wellecks

huggingface.co

0

3

Pranjal Aggarwal ✈️ COLM 🍁

@PranjalAggarw16

4 months

L1-1.5B-Short ouptuts < 512 tokens without special prompting. Due to its efficiency, we were able to RL train it for over 1100 steps (batch size 1024) with GRPO. Despite its small size, it achieves strong performance and runs in real-time on laptops and phones. Model Link:

1

4

Pranjal Aggarwal ✈️ COLM 🍁

@PranjalAggarw16

4 months

L1 demonstrated reasoning models can excel at short reasoning lengths, showing self-correction and backtracking, while outperforming larger models at identical token budgets. We took this idea further and trained an exclusive Short-Reasoning Model (SRM): L1-1.5B-Short.

1

3

Pranjal Aggarwal ✈️ COLM 🍁

@PranjalAggarw16

4 months

We scaled the L1 training approach to Deepseek-Distilled-Qwen7B and Qwen3-8B. Both models show excellent length control and significantly outperform existing baselines. Model Link: https://t.co/84jempio04 🧵

1

0

3

Pranjal Aggarwal ✈️ COLM 🍁

@PranjalAggarw16

4 months

Super excited to see L1 accepted to #COLM2025! We are further open-sourcing 5 new models & a dataset: 1. L1-7B & L1-8B: Exact and Max variants 2. L1-1.5B-Short: Short reasoning model (SRM), RL-trained on 1.2M data points 3. Massive-Math-455K: A clean, unified math dataset 🧵

Pranjal Aggarwal ✈️ COLM 🍁

@PranjalAggarw16

8 months

What if you could control how long a reasoning model “thinks”? Presenting L1-1.5B, an RL-trained reasoning model with: - controllable thinking length via a prompt - better performance per token than S1 - better short CoT performance than GPT-4o https://t.co/h6sTfywity 🧵

1

3

18

Shashwat Goel

@ShashwatGoel7

6 months

Confused about recent LLM RL results where models improve without any ground-truth signal? We were too. Until we looked at the reported numbers of the Pre-RL models and realized they were serverely underreported across papers. We compiled discrepancies in a blog below🧵👇

33

124

875

Sean Welleck

@wellecks

7 months

AlphaVerus has been accepted at #ICML2025! https://t.co/iLicoq1uj8 https://t.co/xi5kJrbxE9 We've seen in math that good verification (e.g., Lean) unlocks surprising capabilities–why not for code too? AlphaVerus puts LLMs & Rust’s Verus verifier into a self-improving loop–lots

Sean Welleck

@wellecks

11 months

We present AlphaVerus, which enables LLMs to generate provably correct Rust code via a new tree search and self-improvement loop Very excited about AlphaVerus as a starting point for truly trustworthy code generation. Amazing work by @PranjalAggarw16! https://t.co/iLicoq228G

5

9

82

Sean Welleck

@wellecks

7 months

Cool to see our L1 ( https://t.co/KcUXPIZXxO) methodology used here! And a nice insight about using the controllable reasoning budget to enable more efficient use of inference hardware

Prime Intellect

@PrimeIntellect

7 months

With INTELLECT-2 we aim for frontier reasoning performance with a controllable thinking budget. By incorporating length rewards into our training run, users can specify how long the model should reason for a given task. https://t.co/K6MctjHcXX

3

10

97

Sean Welleck

@wellecks

8 months

The recent Claude 3.7 model from Anthropic lets you control the budget for thinking—how might this work? Check out L1, our fully open recipe for training reasoning models with controllable thinking budgets!

Pranjal Aggarwal ✈️ COLM 🍁

@PranjalAggarw16

8 months

What if you could control how long a reasoning model “thinks”? Presenting L1-1.5B, an RL-trained reasoning model with: - controllable thinking length via a prompt - better performance per token than S1 - better short CoT performance than GPT-4o https://t.co/h6sTfywity 🧵

4

7

75

Pranjal Aggarwal ✈️ COLM 🍁

@PranjalAggarw16

8 months

We are open-sourcing everything! Including Code, Models and Outputs! Check out our website and paper for more details! Website: https://t.co/h6sTfywity Code: https://t.co/NhGT9Srjdn Paper: https://t.co/khJ7uD74FL Models: https://t.co/84jemphQaw w/ @wellecks

huggingface.co

0

2

16