Sean McLeish ✈️ NeurIPS @SeanMcleish X Profile

Sean McLeish ✈️ NeurIPS

@SeanMcleish

Followers

593

Following

361

Media

18

Statuses

118

PhD student at the University of Maryland

Joined November 2023

Don't wanna be here? Send us removal request.

Sean McLeish ✈️ NeurIPS

@SeanMcleish

18 days

Looped latent reasoning models like TRM, HRM, Ouro and Huginn are great for reasoning, but they’re inefficient to train at larger scales. We fix this by post training regular language models into looped models, achieving higher accuracy on a per training FLOP basis. 📜1/7

9

65

386

Ksenia_TuringPost

@TheTuringPost

11 days

Must-read AI research of the week: ▪️ LeJEPA ▪️ The Path Not Taken: RLVR Provably Learns Off the Principals ▪️ RLVE: Scaling Up Reinforcement Learning for LMs with Adaptive Verifiable Environments ▪️ Intelligence per Watt: Measuring Intelligence Efficiency of Local AI ▪️

2

35

193

DailyPapers

@HuggingPapers

13 days

Make your Language Models *think* deeper with Retrofitted Recurrence New research shows how to convert existing pretrained LMs into depth-recurrent models. This decouples training & test-time compute, improving performance on tasks like mathematics while reducing cost.

5

13

83

fly51fly

@fly51fly

13 days

[CL] Teaching Pretrained Language Models to Think Deeper with Retrofitted Recurrence S McLeish, A Li, J Kirchenbauer, D S Kalra... [University of Maryland & New York University] (2025) https://t.co/p8FHwihga3

2

7

56

Max Spero

@max_spero_

14 days

Curious about AI use in paper writing or reviews? We ran every paper and every review through @pangramlabs, and this is what we found. 🧵

Graham Neubig

@gneubig

14 days

ICLR authors, want to check if your reviews are likely AI generated? ICLR reviewers, want to check if your paper is likely AI generated? Here are AI detection results for every ICLR paper and review from @pangramlabs! It seems that ~21% of reviews may be AI?

5

11

57

Micah Goldblum

@micahgoldblum

16 days

An LLM-generated paper is in the top 17% of ICLR submissions in terms of average reviewer score, having received two 8's. The paper has tons of BS jargon and hallucinated references. Fortunately, one reviewer actually looked at the paper and gave it a zero. 1/3

40

144

1K

Rohan Paul

@rohanpaul_ai

17 days

This paper teaches existing LLMs to “think longer” by adding a loop inside the network. They cut the model into a prelude, a recurrent block, and a coda, then run the block multiple times. A small adapter mixes the prelude’s features with the running hidden state so each loop

10

29

139

Tom Goldstein

@tomgoldsteincs

18 days

Llama models can increase their performance on reasoning benchmarks if we fine-tune them to "think" longer with recurrence 👇

Sean McLeish ✈️ NeurIPS

@SeanMcleish

18 days

Looped latent reasoning models like TRM, HRM, Ouro and Huginn are great for reasoning, but they’re inefficient to train at larger scales. We fix this by post training regular language models into looped models, achieving higher accuracy on a per training FLOP basis. 📜1/7

0

5

26

Nishant Balepur

@NishantBalepur

18 days

Check out this paper from the reasoning 🐐

Sean McLeish ✈️ NeurIPS

@SeanMcleish

18 days

Looped latent reasoning models like TRM, HRM, Ouro and Huginn are great for reasoning, but they’re inefficient to train at larger scales. We fix this by post training regular language models into looped models, achieving higher accuracy on a per training FLOP basis. 📜1/7

0

1

6

Bhavya Kailkhura

@bkailkhu

18 days

Thanks to @SeanMcleish, now you can turn your favorite pretrained LLM (like LLaMA) into a deeper thinker by teaching it to loop itself. LLMs become smarter after this upgrade 🔁🔁🔁🧠

Sean McLeish ✈️ NeurIPS

@SeanMcleish

18 days

Looped latent reasoning models like TRM, HRM, Ouro and Huginn are great for reasoning, but they’re inefficient to train at larger scales. We fix this by post training regular language models into looped models, achieving higher accuracy on a per training FLOP basis. 📜1/7

0

3

8

Monte Hoover

@MonteBHoover

18 days

Latent reasoning models show a lot of promise, but so far the research has explored training them from scratch. My colleagues take the next step and explore how to turn pretrained language models into latent reasoning models:

Sean McLeish ✈️ NeurIPS

@SeanMcleish

18 days

Looped latent reasoning models like TRM, HRM, Ouro and Huginn are great for reasoning, but they’re inefficient to train at larger scales. We fix this by post training regular language models into looped models, achieving higher accuracy on a per training FLOP basis. 📜1/7

0

2

12

Sean McLeish ✈️ NeurIPS

@SeanMcleish

18 days

Check out Micah's breakdown of the paper here

Micah Goldblum

@micahgoldblum

18 days

🚨We converted pretrained LLMs into looped LLMs that can crank up performance by looping for more iterations. Our looped models surpass the performance of the pretrained models we started out with, showing that existing models benefit from increased computational depth. 📜1/9

0

17

Sean McLeish ✈️ NeurIPS

@SeanMcleish

18 days

Paper 📖: https://t.co/oSauOzU7iE Code 💻: https://t.co/uERxkGLIY7 Models 🤗: https://t.co/91TO6bnJB7 Thanks to my amazing collaborators: @iamleonli, @jwkirchenbauer, @dayal_kalra, @bartoldson, @bkailkhu, @A_v_i__S, @jonasgeiping, @tomgoldsteincs, @micahgoldblum 7/7

huggingface.co

2

40

Sean McLeish ✈️ NeurIPS

@SeanMcleish

18 days

Finally, we focus on creating an all round good language model that is depth recurrent, competing with and sometimes beating our Huginn-0125 model with <1/3 of the parameters and a lot less compute required to train. 6/7

1

0

26

Sean McLeish ✈️ NeurIPS

@SeanMcleish

18 days

Overall, we see accuracy gains across the board for TinyLlama, Llama and Olmo on GSM8K and MATH. 5/7

1

0

23

Sean McLeish ✈️ NeurIPS

@SeanMcleish

18 days

We see our biggest gains on GSM8K, removing >25% of parameters from TinyLlama, and looping a core block beats finetuning the fixed depth baseline on a per FLOP basis. 4/7

1

0

31

Sean McLeish ✈️ NeurIPS

@SeanMcleish

18 days

Our looped models are trained with the depth being randomly sampled at each step, like Huginn-0125. We find scheduling the mean of this distribution up to its max during training, causes no performance decrease but does save a lot of FLOPs. 3/7

1

0

26

Sean McLeish ✈️ NeurIPS

@SeanMcleish

18 days

First, we find that initializing from pretrained models is better than random initialization, so we can transfer knowledge from static depth models into looped models. 2/7

1

0

39

Micah Goldblum

@micahgoldblum

18 days

🚨We converted pretrained LLMs into looped LLMs that can crank up performance by looping for more iterations. Our looped models surpass the performance of the pretrained models we started out with, showing that existing models benefit from increased computational depth. 📜1/9

10

25

148

Ashwinee Panda ✈️NeurIPS2025

@PandaAshwinee

19 days

our Gemstones paper on Scaling Laws is accepted at @NeurIPSConf! we release a bunch of models trained up to 2B params with varying width / depth and analyze the impact of scaling hidden dimension vs number of blocks in terms of FLOP-optimal and GPUhr-optimal. 🧵

1

5

19