Sumeet Motwani @sumeetrm X Profile

Sumeet Motwani

@sumeetrm

Followers

2K

Following

4K

Media

51

Statuses

363

ML PhD at Oxford, Previously CS at UC Berkeley

https://t.co/cRdwoA9k0N

Bay Area, CA

Joined February 2024

Don't wanna be here? Send us removal request.

Sumeet Motwani

@sumeetrm

11 days

🚨How do we improve long-horizon reasoning capabilities by scaling RL with only existing data? Introducing our new paper: "h1: Bootstrapping LLMs to Reason over Longer Horizons via Reinforcement Learning"🫡 > RL on existing datasets saturates very quickly > Reasoning over

10

47

280

Thomas Bloom

@thomasfbloom

3 days

@kevinweil Hi, as the owner/maintainer of https://t.co/69gOJM7Ci7, this is a dramatic misrepresentation. GPT-5 found references, which solved these problems, that I personally was unaware of. The 'open' status only means I personally am unaware of a paper which solves it.

22

142

3K

Satwik Bhattamishra

@satwik1729

3 days

Excited to share our new work on the expressivity of Transformer-based multi-agent systems and understanding the trade-offs in communication, no. of agents, and achievable speedups ✨ Work led by @frisbeemortel; check out his thread for details!

Michael Rizvi-Martel

@frisbeemortel

3 days

Is there such a thing as too many agents in multi-agent systems? It depends! 🧵 Our work reveals 3 distinct regimes where communication patterns differ dramatically. More on our findings below 👇 (1/7)

0

4

12

Sumeet Motwani

@sumeetrm

3 days

I strongly believe that good reasoning work should test on Instruct models, and not just base models. Otherwise, any gains you see are probably from better instruction tuning+ performance already present in the model... Be careful out there, readers.

1

21

Akshit

@akshitwt

3 days

the one thing this definition misses is reliability; models may be great at solving very hard problems at pass@K, but that is not very reliable for a usable agent. same problem w/ METR - they calculate time horizon at 50% accuracy, which is too low to be useful. pass@K is good

Dan Hendrycks

@DanHendrycks

4 days

The term “AGI” is currently a vague, moving goalpost. To ground the discussion, we propose a comprehensive, testable definition of AGI. Using it, we can quantify progress: GPT-4 (2023) was 27% of the way to AGI. GPT-5 (2025) is 58%. Here’s how we define and measure it: 🧵

0

1

19

Sumeet Motwani

@sumeetrm

4 days

RL with a curriculum can teach new skills, and these lead nice improvements on much harder math benchmarks and entirely OOD reasoning tasks (ReasoningGym/Long-context benchmarks)! Should be exciting to scale this up further

Sumeet Motwani

@sumeetrm

11 days

Importantly, our results surpass standard RL training on the same underlying dataset and the instruct model even at a very high pass@k, teaching new long-horizon reasoning capabilities! @YangYue_THU and @_AndrewZhao’s work on studying LLM reasoning capabilities beyond the base

1

0

12

Sumeet Motwani

@sumeetrm

10 days

One very interesting result from our work is that RL using a curriculum can teach novel capabilities that you can't elicit otherwise (even at very high pass@k) Curriculum learning is back! @jxmnop

Sumeet Motwani

@sumeetrm

11 days

🚨How do we improve long-horizon reasoning capabilities by scaling RL with only existing data? Introducing our new paper: "h1: Bootstrapping LLMs to Reason over Longer Horizons via Reinforcement Learning"🫡 > RL on existing datasets saturates very quickly > Reasoning over

3

14

141

Siddarth Venkatraman

@siddarthv66

4 days

I disagree with this idea that RL for LLMs is only capable of mode sharpening. 1) This mode sharpening intuition is probably true for shallow RL training (single-task, fewer steps). 2) However for long RL training with a good curriculum of reasoning tasks, the model can start

Joan Velja

@Joanvelja

6 days

@rosinality I’ve always had the following intuition pump for it: If a policy pi can sample at least 1 successful trajectory out of N draws, this means that the distribution of the task (however fuzzy/well defined it is) is comprised in the policy. RL via outcome based reward hence does

11

5

112

Eiso Kant

@eisokant

5 days

We believe that to compete at the frontier, you have to own the full stack: from dirt to intelligence. Today we’re announcing two major unlocks for our mission to AGI: 1. We're partnering with @CoreWeave and have 40,000+ NVIDIA GB300s secured. First capacity comes online

33

48

412

Waymo

@Waymo

5 days

We’re bringing the magic of Waymo to Londoners in 2026 💂🇬🇧

264

451

4K

Shashwat Goel

@ShashwatGoel7

9 days

Crazy good :0 transfer from their new long horizon training method, trained on compositions of GSM8K, gives large gains across reasoning gym tasks.

Sumeet Motwani

@sumeetrm

9 days

Just tested on some Reasoning Gym domains! h1 on long-horizon GSM transfers to: Propositional logic Instruct model: 22.9% h1 training: 47.1% Graphs (largest island) Instruct model: 15% h1 training: 22.5% Algorithmic (sentence reordering) Instruct: 9.6% h1 training: 18.8%

1

16

Sumeet Motwani

@sumeetrm

9 days

Reasoning Gym domains - h1 on long-horizon GSM transfers to: Propositional logic Instruct model: 22.9% h1 training: 47.1% Graphs (largest island) Instruct model: 15% h1 training: 22.5% Algorithmic (sentence reordering) Instruct: 9.6% h1 training: 18.8% Algorithmic (manipulate

0

1

3

Sumeet Motwani

@sumeetrm

9 days

Just tested on some Reasoning Gym domains! h1 on long-horizon GSM transfers to: Propositional logic Instruct model: 22.9% h1 training: 47.1% Graphs (largest island) Instruct model: 15% h1 training: 22.5% Algorithmic (sentence reordering) Instruct: 9.6% h1 training: 18.8%

Sumeet Motwani

@sumeetrm

11 days

🚨How do we improve long-horizon reasoning capabilities by scaling RL with only existing data? Introducing our new paper: "h1: Bootstrapping LLMs to Reason over Longer Horizons via Reinforcement Learning"🫡 > RL on existing datasets saturates very quickly > Reasoning over

0

1

24

Christian Schroeder de Witt

@casdewitt

10 days

Emerging from presenting MALT: Improving reasoning with multi-agent LLM training @COLM2025 to share the next work on reasoning: this time, showing that long-horizon reasoning can be significantky improved by curriculum training on chained tasks. Fantastic efforts led by

Sumeet Motwani

@sumeetrm

11 days

🚨How do we improve long-horizon reasoning capabilities by scaling RL with only existing data? Introducing our new paper: "h1: Bootstrapping LLMs to Reason over Longer Horizons via Reinforcement Learning"🫡 > RL on existing datasets saturates very quickly > Reasoning over

0

2

8

Sumeet Motwani

@sumeetrm

10 days

One very interesting result from our work is that RL using a curriculum can teach novel capabilities that you can't elicit otherwise (even at very high pass@k) Curriculum learning is back! @jxmnop

Sumeet Motwani

@sumeetrm

11 days

🚨How do we improve long-horizon reasoning capabilities by scaling RL with only existing data? Introducing our new paper: "h1: Bootstrapping LLMs to Reason over Longer Horizons via Reinforcement Learning"🫡 > RL on existing datasets saturates very quickly > Reasoning over

3

14

141

Ameya P.

@AmyPrb

11 days

Must read paper! 👇 Was super interested in the degree of imrovement in long horizon performance even on pass@k

Sumeet Motwani

@sumeetrm

11 days

🚨How do we improve long-horizon reasoning capabilities by scaling RL with only existing data? Introducing our new paper: "h1: Bootstrapping LLMs to Reason over Longer Horizons via Reinforcement Learning"🫡 > RL on existing datasets saturates very quickly > Reasoning over

0

1

2

Sumeet Motwani

@sumeetrm

10 days

Thanks @ShashwatGoel7!

Shashwat Goel

@ShashwatGoel7

10 days

The new wave of RL papers is finally getting interesting. This one: training models on compositions of short reasoning tasks generalizes to them becoming better at other longer reasoning tasks. Here, you can train on easy gsm data -> gain on hard math. @sumeetrm et al. cooked.

0

1

Sumeet Motwani

@sumeetrm

10 days

Composing short-horizon data to build long-horizon curriculums can be scaled a really long way!

Akshit

@akshitwt

11 days

the next era of important LLM applications will require long horizon capabilities new cool paper on how to improve those capabilities by (re)using existing data! check out below

0

3

Andrew Zhao

@_AndrewZhao

10 days

Free lunch to turn your models from short -> long horizon reasoners Congratz on the release! @sumeetrm

Sumeet Motwani

@sumeetrm

11 days

🚨How do we improve long-horizon reasoning capabilities by scaling RL with only existing data? Introducing our new paper: "h1: Bootstrapping LLMs to Reason over Longer Horizons via Reinforcement Learning"🫡 > RL on existing datasets saturates very quickly > Reasoning over

1

16

Rohan Paul

@rohanpaul_ai

11 days

New @Microsoft + Princeton + Oxford paper shows shows how to train LLMs for long multi-step reasoning without any new labeled data, by chaining short problems and using outcome-only reinforcement learning with a growing-length curriculum. The big deal is that long-horizon skills

5

40

222