Brian Bartoldson
@bartoldson
Followers
353
Following
2K
Media
14
Statuses
237
ML researcher
USA
Joined October 2016
๐ We fixed a major LLM post-training bottleneck! Our new method (TBA) combines trajectory balance with asynchronous training to speed up LLM RL 5-50x while improving results+scalability. For example, using VinePPO's GSM8K setup, we obtain +1.2% accuracy and 50x faster RL.
3
53
257
Looped latent reasoning models like TRM, HRM, Ouro and Huginn are great for reasoning, but theyโre inefficient to train at larger scales. We fix this by post training regular language models into looped models, achieving higher accuracy on a per training FLOP basis. ๐1/7
10
65
386
๐จWe converted pretrained LLMs into looped LLMs that can crank up performance by looping for more iterations. Our looped models surpass the performance of the pretrained models we started out with, showing that existing models benefit from increased computational depth. ๐1/9
10
25
148
New work on improving test-time scaling for challenging reasoning problems! Recursive Self-Aggregation (RSA) is a simple and scalable approach to combine sequential and parallel reasoning effectively. Check out @siddarthv66's thread for details. Some of my perspectives below:
NO verifiers. NO Tools. Qwen3-4B-Instruct can match DeepSeek-R1 and o3-mini (high) with ONLY test-time scaling. Presenting Recursive Self-Aggregation (RSA) โ the strongest test-time scaling method I know of! Then we use aggregation-aware RL to push further!! ๐๐ ๐งตbelow!
1
15
101
Introducing ๐๐๐๐ฎ๐ซ๐ฌ๐ข๐ฏ๐ ๐๐๐ฅ๐-๐๐ ๐ ๐ซ๐๐ ๐๐ญ๐ข๐จ๐ง (๐๐๐): a simple test-time method that unlocks deep thinking in LLMs by evolving & aggregating reasoning chains. ๐น Qwen3โ4B matches capabilities of much larger models (DeepSeekโR1, o3โmini) ๐น Massive gains on
NO verifiers. NO Tools. Qwen3-4B-Instruct can match DeepSeek-R1 and o3-mini (high) with ONLY test-time scaling. Presenting Recursive Self-Aggregation (RSA) โ the strongest test-time scaling method I know of! Then we use aggregation-aware RL to push further!! ๐๐ ๐งตbelow!
2
7
28
Introducing RSA ๐ (Recursive Self-Aggregation): unlocking deep thinking with test-time scaling ๐ฅ Qwen3-4B + RSA > DeepSeek-R1 ๐ Gains across Qwen, Nemo, GPT-OSS ๐ Benchmarks: Math โข Reasoning Gym โข Code โก Aggregation-aware RL lets Qwen3-4B surpass o3-mini ๐
1
6
28
Qwen3-4B can match DeepSeek-R1 and o3-mini (high) with ONLY test-time scaling?๐คฏ Introducing Recursive Self-Aggregation (RSA), a new test-time scaling method: - parallel + sequentialโ
- no verifiersโ
- no scaffoldingโ
Then we use aggregation-aware RL to push further!๐ ๐งต๐
1
11
35
NO verifiers. NO Tools. Qwen3-4B-Instruct can match DeepSeek-R1 and o3-mini (high) with ONLY test-time scaling. Presenting Recursive Self-Aggregation (RSA) โ the strongest test-time scaling method I know of! Then we use aggregation-aware RL to push further!! ๐๐ ๐งตbelow!
23
102
784
๐ญ๐ LLM security is a cat-and-mouse game. Attackers adapt. Prompts mutate. Meanwhile, most defenses? ๐ซ Static. Fragile. One-shot fixes. Itโs time for something smarter. โ๏ธ Meet AegisLLM: An agentic runtime defense that thinks, reacts, and learns โ just like the attackers do.
2
27
91
๐ Excited to introduce our latest work GRESO: a method that identifies and skips millions of uninformative prompts before rollout and achieves up to 2.0x wall-clock time speedup in training. More rollouts lead to better model performance, but theyโre also a major bottleneck in
1
30
165
Here's a free/gift link to the Washington Post article about training LLMs on openly licensed text: https://t.co/fQ6aqfwUwJ.
https://t.co/AUUhEYviu0
For more, check out... Paper: https://t.co/FdjRmtPG0N Artifacts: https://t.co/Ab2qekWqHv GitHub: https://t.co/1NVQJjuDRj EleutherAI's blog post: https://t.co/bdP1HADFTM Coverage in @washingtonpost by @nitashatiku:
1
1
6
Can you train a performant language models without using unlicensed text? We are thrilled to announce the Common Pile v0.1, an 8TB dataset of openly licensed and public domain text. We train 7B models for 1T and 2T tokens and match the performance similar models like LLaMA 1&2
16
146
584
๐ฅณCome chat with @bartoldson and @JainMoksh at our TBA poster at the #ICLR25 workshop on Open Science for Foundation Models (SCI-FM). The workshop will be held in EXPO Hall 4 #5 on Monday, April 28th.
At #ICLR2025 and interested in the science of deep RL? 2 great papers are being presented today from 3โ5:30โฏPM. Don't Flatten, Tokenize! - Spotlight presentation at #363. Neuroplastic Expansion - Poster presentation at #361. Donโt miss it, go chat with amazing co-authors!๐ฅณ
0
5
19
Try out Trajectory Balance with Asynchrony via https://t.co/bolArKJUzf.
github.com
Official implementation of TBA for async LLM post-training. - bbartoldson/TBA
0
0
4
๐ The code for LLM post-training with TBA is now available! Try out Trajectory Balance with Asynchrony via https://t.co/w173u63tHM.
https://t.co/suu63c2nRs
๐ We fixed a major LLM post-training bottleneck! Our new method (TBA) combines trajectory balance with asynchronous training to speed up LLM RL 5-50x while improving results+scalability. For example, using VinePPO's GSM8K setup, we obtain +1.2% accuracy and 50x faster RL.
0
7
26
๐จConcerned about visual jailbreaking attacks holding back Vision-Language Model (VLM) deployment? ๐ Excited to announce our latest research: Double Visual Defense! TL;DR: We introduce ฮCLIP and ฮยฒLLaVA โ the first to reconcile robust adversarial performance with
1
7
21
๐จ Interested in adopting Large Reasoning Models (LRMs) but concerned about safety risks? ๐จ Meet STAR-1 ๐ โ A compact, high-quality safety dataset (just 1K samples!) boosting LRMs' safety by 40% with only a minimal (~1.1%) reasoning drop! ๐ How we built STAR-1 in just 3
2
17
73
[LG] Trajectory Balance with Asynchrony: Decoupling Exploration and Learning for Fast, Scalable LLM Post-Training B R. Bartoldson, S Venkatraman, J Diffenderfer, M Jain... [Lawrence Livermore National Laboratory & Mila] (2025) https://t.co/uRzhe3CdlS
0
10
35
New paper on synthetic pretraining! We show LMs can synthesize their own thoughts for more data-efficient pretraining, bootstrapping their capabilities on limited, task-agnostic data. We call this new paradigm โreasoning to learnโ. https://t.co/Xd9sLKKVsl Hereโs how it works๐งต
15
105
488
At @Livermore_Lab, we are using AI to: โ๏ธ Solve nuclear fusion ๐งช Discover critical materials ๐ง Red-team vulnerabilities All to push science forward and protect national security ๐ Post-training LLMs at scale can unlock these advances. But even with El Capitanโthe worldโs
1
1
9
Trajectory Balance with Asynchrony: Decoupling Exploration and Learning for Fast, Scalable LLM Post-Training TBA is a scalable RL system for LLM post-training that uses off-policy data and replay buffers with Trajectory Balance. It decouples training from search, improving speed
2
1
17