Sean McLeish ✈️ NeurIPS
@SeanMcleish
Followers
593
Following
361
Media
18
Statuses
118
PhD student at the University of Maryland
Joined November 2023
Looped latent reasoning models like TRM, HRM, Ouro and Huginn are great for reasoning, but they’re inefficient to train at larger scales. We fix this by post training regular language models into looped models, achieving higher accuracy on a per training FLOP basis. 📜1/7
9
65
386
Must-read AI research of the week: ▪️ LeJEPA ▪️ The Path Not Taken: RLVR Provably Learns Off the Principals ▪️ RLVE: Scaling Up Reinforcement Learning for LMs with Adaptive Verifiable Environments ▪️ Intelligence per Watt: Measuring Intelligence Efficiency of Local AI ▪️
2
35
193
Make your Language Models *think* deeper with Retrofitted Recurrence New research shows how to convert existing pretrained LMs into depth-recurrent models. This decouples training & test-time compute, improving performance on tasks like mathematics while reducing cost.
5
13
83
[CL] Teaching Pretrained Language Models to Think Deeper with Retrofitted Recurrence S McLeish, A Li, J Kirchenbauer, D S Kalra... [University of Maryland & New York University] (2025) https://t.co/p8FHwihga3
2
7
56
Curious about AI use in paper writing or reviews? We ran every paper and every review through @pangramlabs, and this is what we found. 🧵
ICLR authors, want to check if your reviews are likely AI generated? ICLR reviewers, want to check if your paper is likely AI generated? Here are AI detection results for every ICLR paper and review from @pangramlabs! It seems that ~21% of reviews may be AI?
5
11
57
An LLM-generated paper is in the top 17% of ICLR submissions in terms of average reviewer score, having received two 8's. The paper has tons of BS jargon and hallucinated references. Fortunately, one reviewer actually looked at the paper and gave it a zero. 1/3
40
144
1K
This paper teaches existing LLMs to “think longer” by adding a loop inside the network. They cut the model into a prelude, a recurrent block, and a coda, then run the block multiple times. A small adapter mixes the prelude’s features with the running hidden state so each loop
10
29
139
Llama models can increase their performance on reasoning benchmarks if we fine-tune them to "think" longer with recurrence 👇
Looped latent reasoning models like TRM, HRM, Ouro and Huginn are great for reasoning, but they’re inefficient to train at larger scales. We fix this by post training regular language models into looped models, achieving higher accuracy on a per training FLOP basis. 📜1/7
0
5
26
Check out this paper from the reasoning 🐐
Looped latent reasoning models like TRM, HRM, Ouro and Huginn are great for reasoning, but they’re inefficient to train at larger scales. We fix this by post training regular language models into looped models, achieving higher accuracy on a per training FLOP basis. 📜1/7
0
1
6
Thanks to @SeanMcleish, now you can turn your favorite pretrained LLM (like LLaMA) into a deeper thinker by teaching it to loop itself. LLMs become smarter after this upgrade 🔁🔁🔁🧠
Looped latent reasoning models like TRM, HRM, Ouro and Huginn are great for reasoning, but they’re inefficient to train at larger scales. We fix this by post training regular language models into looped models, achieving higher accuracy on a per training FLOP basis. 📜1/7
0
3
8
Latent reasoning models show a lot of promise, but so far the research has explored training them from scratch. My colleagues take the next step and explore how to turn pretrained language models into latent reasoning models:
Looped latent reasoning models like TRM, HRM, Ouro and Huginn are great for reasoning, but they’re inefficient to train at larger scales. We fix this by post training regular language models into looped models, achieving higher accuracy on a per training FLOP basis. 📜1/7
0
2
12
Check out Micah's breakdown of the paper here
🚨We converted pretrained LLMs into looped LLMs that can crank up performance by looping for more iterations. Our looped models surpass the performance of the pretrained models we started out with, showing that existing models benefit from increased computational depth. 📜1/9
0
0
17
Paper 📖: https://t.co/oSauOzU7iE Code 💻: https://t.co/uERxkGLIY7 Models 🤗: https://t.co/91TO6bnJB7 Thanks to my amazing collaborators: @iamleonli, @jwkirchenbauer, @dayal_kalra, @bartoldson, @bkailkhu, @A_v_i__S, @jonasgeiping, @tomgoldsteincs, @micahgoldblum 7/7
huggingface.co
2
2
40
Finally, we focus on creating an all round good language model that is depth recurrent, competing with and sometimes beating our Huginn-0125 model with <1/3 of the parameters and a lot less compute required to train. 6/7
1
0
26
Overall, we see accuracy gains across the board for TinyLlama, Llama and Olmo on GSM8K and MATH. 5/7
1
0
23
We see our biggest gains on GSM8K, removing >25% of parameters from TinyLlama, and looping a core block beats finetuning the fixed depth baseline on a per FLOP basis. 4/7
1
0
31
Our looped models are trained with the depth being randomly sampled at each step, like Huginn-0125. We find scheduling the mean of this distribution up to its max during training, causes no performance decrease but does save a lot of FLOPs. 3/7
1
0
26
First, we find that initializing from pretrained models is better than random initialization, so we can transfer knowledge from static depth models into looped models. 2/7
1
0
39
🚨We converted pretrained LLMs into looped LLMs that can crank up performance by looping for more iterations. Our looped models surpass the performance of the pretrained models we started out with, showing that existing models benefit from increased computational depth. 📜1/9
10
25
148
our Gemstones paper on Scaling Laws is accepted at @NeurIPSConf! we release a bunch of models trained up to 2B params with varying width / depth and analyze the impact of scaling hidden dimension vs number of blocks in terms of FLOP-optimal and GPUhr-optimal. 🧵
1
5
19