sumeetrm Profile Banner
Sumeet Motwani Profile
Sumeet Motwani

@sumeetrm

Followers
2K
Following
4K
Media
51
Statuses
363

ML PhD at Oxford, Previously CS at UC Berkeley

Bay Area, CA
Joined February 2024
Don't wanna be here? Send us removal request.
@sumeetrm
Sumeet Motwani
11 days
🚨How do we improve long-horizon reasoning capabilities by scaling RL with only existing data? Introducing our new paper: "h1: Bootstrapping LLMs to Reason over Longer Horizons via Reinforcement Learning"🫡 > RL on existing datasets saturates very quickly > Reasoning over
10
47
280
@thomasfbloom
Thomas Bloom
3 days
@kevinweil Hi, as the owner/maintainer of https://t.co/69gOJM7Ci7, this is a dramatic misrepresentation. GPT-5 found references, which solved these problems, that I personally was unaware of. The 'open' status only means I personally am unaware of a paper which solves it.
22
142
3K
@satwik1729
Satwik Bhattamishra
3 days
Excited to share our new work on the expressivity of Transformer-based multi-agent systems and understanding the trade-offs in communication, no. of agents, and achievable speedups ✨ Work led by @frisbeemortel; check out his thread for details!
@frisbeemortel
Michael Rizvi-Martel
3 days
Is there such a thing as too many agents in multi-agent systems? It depends! 🧵 Our work reveals 3 distinct regimes where communication patterns differ dramatically. More on our findings below 👇 (1/7)
0
4
12
@sumeetrm
Sumeet Motwani
3 days
I strongly believe that good reasoning work should test on Instruct models, and not just base models. Otherwise, any gains you see are probably from better instruction tuning+ performance already present in the model... Be careful out there, readers.
1
1
21
@akshitwt
Akshit
3 days
the one thing this definition misses is reliability; models may be great at solving very hard problems at pass@K, but that is not very reliable for a usable agent. same problem w/ METR - they calculate time horizon at 50% accuracy, which is too low to be useful. pass@K is good
@DanHendrycks
Dan Hendrycks
4 days
The term “AGI” is currently a vague, moving goalpost. To ground the discussion, we propose a comprehensive, testable definition of AGI. Using it, we can quantify progress: GPT-4 (2023) was 27% of the way to AGI. GPT-5 (2025) is 58%. Here’s how we define and measure it: 🧵
0
1
19
@sumeetrm
Sumeet Motwani
4 days
RL with a curriculum can teach new skills, and these lead nice improvements on much harder math benchmarks and entirely OOD reasoning tasks (ReasoningGym/Long-context benchmarks)! Should be exciting to scale this up further
@sumeetrm
Sumeet Motwani
11 days
Importantly, our results surpass standard RL training on the same underlying dataset and the instruct model even at a very high pass@k, teaching new long-horizon reasoning capabilities! @YangYue_THU and @_AndrewZhao’s work on studying LLM reasoning capabilities beyond the base
1
0
12
@sumeetrm
Sumeet Motwani
10 days
One very interesting result from our work is that RL using a curriculum can teach novel capabilities that you can't elicit otherwise (even at very high pass@k) Curriculum learning is back! @jxmnop
@sumeetrm
Sumeet Motwani
11 days
🚨How do we improve long-horizon reasoning capabilities by scaling RL with only existing data? Introducing our new paper: "h1: Bootstrapping LLMs to Reason over Longer Horizons via Reinforcement Learning"🫡 > RL on existing datasets saturates very quickly > Reasoning over
3
14
141
@siddarthv66
Siddarth Venkatraman
4 days
I disagree with this idea that RL for LLMs is only capable of mode sharpening. 1) This mode sharpening intuition is probably true for shallow RL training (single-task, fewer steps). 2) However for long RL training with a good curriculum of reasoning tasks, the model can start
@Joanvelja
Joan Velja
6 days
@rosinality I’ve always had the following intuition pump for it: If a policy pi can sample at least 1 successful trajectory out of N draws, this means that the distribution of the task (however fuzzy/well defined it is) is comprised in the policy. RL via outcome based reward hence does
11
5
112
@eisokant
Eiso Kant
5 days
We believe that to compete at the frontier, you have to own the full stack: from dirt to intelligence. Today we’re announcing two major unlocks for our mission to AGI: 1. We're partnering with @CoreWeave and have 40,000+ NVIDIA GB300s secured. First capacity comes online
33
48
412
@Waymo
Waymo
5 days
We’re bringing the magic of Waymo to Londoners in 2026 💂🇬🇧
264
451
4K
@ShashwatGoel7
Shashwat Goel
9 days
Crazy good :0 transfer from their new long horizon training method, trained on compositions of GSM8K, gives large gains across reasoning gym tasks.
@sumeetrm
Sumeet Motwani
9 days
Just tested on some Reasoning Gym domains! h1 on long-horizon GSM transfers to: Propositional logic Instruct model: 22.9% h1 training: 47.1% Graphs (largest island) Instruct model: 15% h1 training: 22.5% Algorithmic (sentence reordering) Instruct: 9.6% h1 training: 18.8%
1
1
16
@sumeetrm
Sumeet Motwani
9 days
Reasoning Gym domains - h1 on long-horizon GSM transfers to: Propositional logic Instruct model: 22.9% h1 training: 47.1% Graphs (largest island) Instruct model: 15% h1 training: 22.5% Algorithmic (sentence reordering) Instruct: 9.6% h1 training: 18.8% Algorithmic (manipulate
0
1
3
@sumeetrm
Sumeet Motwani
9 days
Just tested on some Reasoning Gym domains! h1 on long-horizon GSM transfers to: Propositional logic Instruct model: 22.9% h1 training: 47.1% Graphs (largest island) Instruct model: 15% h1 training: 22.5% Algorithmic (sentence reordering) Instruct: 9.6% h1 training: 18.8%
@sumeetrm
Sumeet Motwani
11 days
🚨How do we improve long-horizon reasoning capabilities by scaling RL with only existing data? Introducing our new paper: "h1: Bootstrapping LLMs to Reason over Longer Horizons via Reinforcement Learning"🫡 > RL on existing datasets saturates very quickly > Reasoning over
0
1
24
@casdewitt
Christian Schroeder de Witt
10 days
Emerging from presenting MALT: Improving reasoning with multi-agent LLM training @COLM2025 to share the next work on reasoning: this time, showing that long-horizon reasoning can be significantky improved by curriculum training on chained tasks. Fantastic efforts led by
@sumeetrm
Sumeet Motwani
11 days
🚨How do we improve long-horizon reasoning capabilities by scaling RL with only existing data? Introducing our new paper: "h1: Bootstrapping LLMs to Reason over Longer Horizons via Reinforcement Learning"🫡 > RL on existing datasets saturates very quickly > Reasoning over
0
2
8
@sumeetrm
Sumeet Motwani
10 days
One very interesting result from our work is that RL using a curriculum can teach novel capabilities that you can't elicit otherwise (even at very high pass@k) Curriculum learning is back! @jxmnop
@sumeetrm
Sumeet Motwani
11 days
🚨How do we improve long-horizon reasoning capabilities by scaling RL with only existing data? Introducing our new paper: "h1: Bootstrapping LLMs to Reason over Longer Horizons via Reinforcement Learning"🫡 > RL on existing datasets saturates very quickly > Reasoning over
3
14
141
@AmyPrb
Ameya P.
11 days
Must read paper! 👇 Was super interested in the degree of imrovement in long horizon performance even on pass@k
@sumeetrm
Sumeet Motwani
11 days
🚨How do we improve long-horizon reasoning capabilities by scaling RL with only existing data? Introducing our new paper: "h1: Bootstrapping LLMs to Reason over Longer Horizons via Reinforcement Learning"🫡 > RL on existing datasets saturates very quickly > Reasoning over
0
1
2
@sumeetrm
Sumeet Motwani
10 days
Thanks @ShashwatGoel7!
@ShashwatGoel7
Shashwat Goel
10 days
The new wave of RL papers is finally getting interesting. This one: training models on compositions of short reasoning tasks generalizes to them becoming better at other longer reasoning tasks. Here, you can train on easy gsm data -> gain on hard math. @sumeetrm et al. cooked.
0
0
1
@sumeetrm
Sumeet Motwani
10 days
Composing short-horizon data to build long-horizon curriculums can be scaled a really long way!
@akshitwt
Akshit
11 days
the next era of important LLM applications will require long horizon capabilities new cool paper on how to improve those capabilities by (re)using existing data! check out below
0
0
3
@_AndrewZhao
Andrew Zhao
10 days
Free lunch to turn your models from short -> long horizon reasoners Congratz on the release! @sumeetrm
@sumeetrm
Sumeet Motwani
11 days
🚨How do we improve long-horizon reasoning capabilities by scaling RL with only existing data? Introducing our new paper: "h1: Bootstrapping LLMs to Reason over Longer Horizons via Reinforcement Learning"🫡 > RL on existing datasets saturates very quickly > Reasoning over
1
1
16
@rohanpaul_ai
Rohan Paul
11 days
New @Microsoft + Princeton + Oxford paper shows shows how to train LLMs for long multi-step reasoning without any new labeled data, by chaining short problems and using outcome-only reinforcement learning with a growing-length curriculum. The big deal is that long-horizon skills
5
40
222