Brian Bartoldson @bartoldson X Profile

Brian Bartoldson

@bartoldson

Followers

353

Following

2K

Media

14

Statuses

237

ML researcher

https://t.co/bsJd8QYKqv

USA

Joined October 2016

Don't wanna be here? Send us removal request.

Brian Bartoldson

@bartoldson

8 months

🚀 We fixed a major LLM post-training bottleneck! Our new method (TBA) combines trajectory balance with asynchronous training to speed up LLM RL 5-50x while improving results+scalability. For example, using VinePPO's GSM8K setup, we obtain +1.2% accuracy and 50x faster RL.

3

53

257

Sean McLeish ✈️ NeurIPS

@SeanMcleish

16 days

Looped latent reasoning models like TRM, HRM, Ouro and Huginn are great for reasoning, but they’re inefficient to train at larger scales. We fix this by post training regular language models into looped models, achieving higher accuracy on a per training FLOP basis. 📜1/7

10

65

386

Micah Goldblum

@micahgoldblum

16 days

🚨We converted pretrained LLMs into looped LLMs that can crank up performance by looping for more iterations. Our looped models surpass the performance of the pretrained models we started out with, showing that existing models benefit from increased computational depth. 📜1/9

10

25

148

Moksh Jain

@JainMoksh

2 months

New work on improving test-time scaling for challenging reasoning problems! Recursive Self-Aggregation (RSA) is a simple and scalable approach to combine sequential and parallel reasoning effectively. Check out @siddarthv66's thread for details. Some of my perspectives below:

Siddarth Venkatraman

@siddarthv66

2 months

NO verifiers. NO Tools. Qwen3-4B-Instruct can match DeepSeek-R1 and o3-mini (high) with ONLY test-time scaling. Presenting Recursive Self-Aggregation (RSA) — the strongest test-time scaling method I know of! Then we use aggregation-aware RL to push further!! 📈📈 🧵below!

1

15

101

Bhavya Kailkhura

@bkailkhu

2 months

Introducing 𝐑𝐞𝐜𝐮𝐫𝐬𝐢𝐯𝐞 𝐒𝐞𝐥𝐟-𝐀𝐠𝐠𝐫𝐞𝐠𝐚𝐭𝐢𝐨𝐧 (𝐑𝐒𝐀): a simple test-time method that unlocks deep thinking in LLMs by evolving & aggregating reasoning chains. 🔹 Qwen3‑4B matches capabilities of much larger models (DeepSeek‑R1, o3‑mini) 🔹 Massive gains on

Siddarth Venkatraman

@siddarthv66

2 months

NO verifiers. NO Tools. Qwen3-4B-Instruct can match DeepSeek-R1 and o3-mini (high) with ONLY test-time scaling. Presenting Recursive Self-Aggregation (RSA) — the strongest test-time scaling method I know of! Then we use aggregation-aware RL to push further!! 📈📈 🧵below!

2

7

28

Sarthak Mittal

@sarthmit

2 months

Introducing RSA 🌀 (Recursive Self-Aggregation): unlocking deep thinking with test-time scaling 🔥 Qwen3-4B + RSA > DeepSeek-R1 📈 Gains across Qwen, Nemo, GPT-OSS 🏆 Benchmarks: Math • Reasoning Gym • Code ⚡ Aggregation-aware RL lets Qwen3-4B surpass o3-mini 🚀

1

6

28

Vineet Jain

@thevineetjain

2 months

Qwen3-4B can match DeepSeek-R1 and o3-mini (high) with ONLY test-time scaling?🤯 Introducing Recursive Self-Aggregation (RSA), a new test-time scaling method: - parallel + sequential✅ - no verifiers✅ - no scaffolding✅ Then we use aggregation-aware RL to push further!🚀 🧵👇

1

11

35

Siddarth Venkatraman

@siddarthv66

2 months

NO verifiers. NO Tools. Qwen3-4B-Instruct can match DeepSeek-R1 and o3-mini (high) with ONLY test-time scaling. Presenting Recursive Self-Aggregation (RSA) — the strongest test-time scaling method I know of! Then we use aggregation-aware RL to push further!! 📈📈 🧵below!

23

102

784

Furong Huang

@furongh

5 months

🐭🔒 LLM security is a cat-and-mouse game. Attackers adapt. Prompts mutate. Meanwhile, most defenses? 🚫 Static. Fragile. One-shot fixes. It’s time for something smarter. ⚔️ Meet AegisLLM: An agentic runtime defense that thinks, reacts, and learns — just like the attackers do.

2

27

91

Infini-AI-Lab

@InfiniAILab

5 months

🚀 Excited to introduce our latest work GRESO: a method that identifies and skips millions of uninformative prompts before rollout and achieves up to 2.0x wall-clock time speedup in training. More rollouts lead to better model performance, but they’re also a major bottleneck in

1

30

165

Brian Bartoldson

@bartoldson

6 months

Here's a free/gift link to the Washington Post article about training LLMs on openly licensed text: https://t.co/fQ6aqfwUwJ. https://t.co/AUUhEYviu0

EleutherAI

@AiEleuther

6 months

For more, check out... Paper: https://t.co/FdjRmtPG0N Artifacts: https://t.co/Ab2qekWqHv GitHub: https://t.co/1NVQJjuDRj EleutherAI's blog post: https://t.co/bdP1HADFTM Coverage in @washingtonpost by @nitashatiku:

1

6

EleutherAI

@AiEleuther

6 months

Can you train a performant language models without using unlicensed text? We are thrilled to announce the Common Pile v0.1, an 8TB dataset of openly licensed and public domain text. We train 7B models for 1T and 2T tokens and match the performance similar models like LLaMA 1&2

16

146

584

Johan Obando-Ceron 👍🏽

@johanobandoc

7 months

🥳Come chat with @bartoldson and @JainMoksh at our TBA poster at the #ICLR25 workshop on Open Science for Foundation Models (SCI-FM). The workshop will be held in EXPO Hall 4 #5 on Monday, April 28th.

Johan Obando-Ceron 👍🏽

@johanobandoc

7 months

At #ICLR2025 and interested in the science of deep RL? 2 great papers are being presented today from 3–5:30 PM. Don't Flatten, Tokenize! - Spotlight presentation at #363. Neuroplastic Expansion - Poster presentation at #361. Don’t miss it, go chat with amazing co-authors!🥳

0

5

19

Brian Bartoldson

@bartoldson

8 months

Try out Trajectory Balance with Asynchrony via https://t.co/bolArKJUzf.

github.com

Official implementation of TBA for async LLM post-training. - bbartoldson/TBA

0

4

Brian Bartoldson

@bartoldson

8 months

🚀 The code for LLM post-training with TBA is now available! Try out Trajectory Balance with Asynchrony via https://t.co/w173u63tHM. https://t.co/suu63c2nRs

Brian Bartoldson

@bartoldson

8 months

🚀 We fixed a major LLM post-training bottleneck! Our new method (TBA) combines trajectory balance with asynchronous training to speed up LLM RL 5-50x while improving results+scalability. For example, using VinePPO's GSM8K setup, we obtain +1.2% accuracy and 50x faster RL.

0

7

26

Cihang Xie@NeurIPS

@cihangxie

8 months

🚨Concerned about visual jailbreaking attacks holding back Vision-Language Model (VLM) deployment? 🌟 Excited to announce our latest research: Double Visual Defense! TL;DR: We introduce ΔCLIP and Δ²LLaVA — the first to reconcile robust adversarial performance with

1

7

21

Cihang Xie@NeurIPS

@cihangxie

8 months

🚨 Interested in adopting Large Reasoning Models (LRMs) but concerned about safety risks? 🚨 Meet STAR-1 🌟 – A compact, high-quality safety dataset (just 1K samples!) boosting LRMs' safety by 40% with only a minimal (~1.1%) reasoning drop! 🚀 How we built STAR-1 in just 3

2

17

73

fly51fly

@fly51fly

8 months

[LG] Trajectory Balance with Asynchrony: Decoupling Exploration and Learning for Fast, Scalable LLM Post-Training B R. Bartoldson, S Venkatraman, J Diffenderfer, M Jain... [Lawrence Livermore National Laboratory & Mila] (2025) https://t.co/uRzhe3CdlS

0

10

35

Yangjun Ruan

@YangjunR

8 months

New paper on synthetic pretraining! We show LMs can synthesize their own thoughts for more data-efficient pretraining, bootstrapping their capabilities on limited, task-agnostic data. We call this new paradigm “reasoning to learn”. https://t.co/Xd9sLKKVsl Here’s how it works🧵

15

105

488

Bhavya Kailkhura

@bkailkhu

8 months

At @Livermore_Lab, we are using AI to: ⚛️ Solve nuclear fusion 🧪 Discover critical materials 🧠 Red-team vulnerabilities All to push science forward and protect national security 🌎 Post-training LLMs at scale can unlock these advances. But even with El Capitan—the world’s

1

9

𝚐𝔪𝟾𝚡𝚡𝟾

@gm8xx8

8 months

Trajectory Balance with Asynchrony: Decoupling Exploration and Learning for Fast, Scalable LLM Post-Training TBA is a scalable RL system for LLM post-training that uses off-policy data and replay buffers with Trajectory Balance. It decouples training from search, improving speed

2

1

17