Brian Bartoldson Profile
Brian Bartoldson

@bartoldson

Followers
353
Following
2K
Media
14
Statuses
237

ML researcher

USA
Joined October 2016
Don't wanna be here? Send us removal request.
@bartoldson
Brian Bartoldson
8 months
๐Ÿš€ We fixed a major LLM post-training bottleneck! Our new method (TBA) combines trajectory balance with asynchronous training to speed up LLM RL 5-50x while improving results+scalability. For example, using VinePPO's GSM8K setup, we obtain +1.2% accuracy and 50x faster RL.
3
53
257
@SeanMcleish
Sean McLeish โœˆ๏ธ NeurIPS
16 days
Looped latent reasoning models like TRM, HRM, Ouro and Huginn are great for reasoning, but theyโ€™re inefficient to train at larger scales. We fix this by post training regular language models into looped models, achieving higher accuracy on a per training FLOP basis. ๐Ÿ“œ1/7
10
65
386
@micahgoldblum
Micah Goldblum
16 days
๐ŸšจWe converted pretrained LLMs into looped LLMs that can crank up performance by looping for more iterations. Our looped models surpass the performance of the pretrained models we started out with, showing that existing models benefit from increased computational depth. ๐Ÿ“œ1/9
10
25
148
@JainMoksh
Moksh Jain
2 months
New work on improving test-time scaling for challenging reasoning problems! Recursive Self-Aggregation (RSA) is a simple and scalable approach to combine sequential and parallel reasoning effectively. Check out @siddarthv66's thread for details. Some of my perspectives below:
@siddarthv66
Siddarth Venkatraman
2 months
NO verifiers. NO Tools. Qwen3-4B-Instruct can match DeepSeek-R1 and o3-mini (high) with ONLY test-time scaling. Presenting Recursive Self-Aggregation (RSA) โ€” the strongest test-time scaling method I know of! Then we use aggregation-aware RL to push further!! ๐Ÿ“ˆ๐Ÿ“ˆ ๐Ÿงตbelow!
1
15
101
@bkailkhu
Bhavya Kailkhura
2 months
Introducing ๐‘๐ž๐œ๐ฎ๐ซ๐ฌ๐ข๐ฏ๐ž ๐’๐ž๐ฅ๐Ÿ-๐€๐ ๐ ๐ซ๐ž๐ ๐š๐ญ๐ข๐จ๐ง (๐‘๐’๐€): a simple test-time method that unlocks deep thinking in LLMs by evolving & aggregating reasoning chains. ๐Ÿ”น Qwen3โ€‘4B matches capabilities of much larger models (DeepSeekโ€‘R1, o3โ€‘mini) ๐Ÿ”น Massive gains on
@siddarthv66
Siddarth Venkatraman
2 months
NO verifiers. NO Tools. Qwen3-4B-Instruct can match DeepSeek-R1 and o3-mini (high) with ONLY test-time scaling. Presenting Recursive Self-Aggregation (RSA) โ€” the strongest test-time scaling method I know of! Then we use aggregation-aware RL to push further!! ๐Ÿ“ˆ๐Ÿ“ˆ ๐Ÿงตbelow!
2
7
28
@sarthmit
Sarthak Mittal
2 months
Introducing RSA ๐ŸŒ€ (Recursive Self-Aggregation): unlocking deep thinking with test-time scaling ๐Ÿ”ฅ Qwen3-4B + RSA > DeepSeek-R1 ๐Ÿ“ˆ Gains across Qwen, Nemo, GPT-OSS ๐Ÿ† Benchmarks: Math โ€ข Reasoning Gym โ€ข Code โšก Aggregation-aware RL lets Qwen3-4B surpass o3-mini ๐Ÿš€
1
6
28
@thevineetjain
Vineet Jain
2 months
Qwen3-4B can match DeepSeek-R1 and o3-mini (high) with ONLY test-time scaling?๐Ÿคฏ Introducing Recursive Self-Aggregation (RSA), a new test-time scaling method: - parallel + sequentialโœ… - no verifiersโœ… - no scaffoldingโœ… Then we use aggregation-aware RL to push further!๐Ÿš€ ๐Ÿงต๐Ÿ‘‡
1
11
35
@siddarthv66
Siddarth Venkatraman
2 months
NO verifiers. NO Tools. Qwen3-4B-Instruct can match DeepSeek-R1 and o3-mini (high) with ONLY test-time scaling. Presenting Recursive Self-Aggregation (RSA) โ€” the strongest test-time scaling method I know of! Then we use aggregation-aware RL to push further!! ๐Ÿ“ˆ๐Ÿ“ˆ ๐Ÿงตbelow!
23
102
784
@furongh
Furong Huang
5 months
๐Ÿญ๐Ÿ”’ LLM security is a cat-and-mouse game. Attackers adapt. Prompts mutate. Meanwhile, most defenses? ๐Ÿšซ Static. Fragile. One-shot fixes. Itโ€™s time for something smarter. โš”๏ธ Meet AegisLLM: An agentic runtime defense that thinks, reacts, and learns โ€” just like the attackers do.
2
27
91
@InfiniAILab
Infini-AI-Lab
5 months
๐Ÿš€ Excited to introduce our latest work GRESO: a method that identifies and skips millions of uninformative prompts before rollout and achieves up to 2.0x wall-clock time speedup in training. More rollouts lead to better model performance, but theyโ€™re also a major bottleneck in
1
30
165
@bartoldson
Brian Bartoldson
6 months
Here's a free/gift link to the Washington Post article about training LLMs on openly licensed text: https://t.co/fQ6aqfwUwJ. https://t.co/AUUhEYviu0
@AiEleuther
EleutherAI
6 months
For more, check out... Paper: https://t.co/FdjRmtPG0N Artifacts: https://t.co/Ab2qekWqHv GitHub: https://t.co/1NVQJjuDRj EleutherAI's blog post: https://t.co/bdP1HADFTM Coverage in @washingtonpost by @nitashatiku:
1
1
6
@AiEleuther
EleutherAI
6 months
Can you train a performant language models without using unlicensed text? We are thrilled to announce the Common Pile v0.1, an 8TB dataset of openly licensed and public domain text. We train 7B models for 1T and 2T tokens and match the performance similar models like LLaMA 1&2
16
146
584
@johanobandoc
Johan Obando-Ceron ๐Ÿ‘๐Ÿฝ
7 months
๐ŸฅณCome chat with @bartoldson and @JainMoksh at our TBA poster at the #ICLR25 workshop on Open Science for Foundation Models (SCI-FM). The workshop will be held in EXPO Hall 4 #5 on Monday, April 28th.
@johanobandoc
Johan Obando-Ceron ๐Ÿ‘๐Ÿฝ
7 months
At #ICLR2025 and interested in the science of deep RL? 2 great papers are being presented today from 3โ€“5:30โ€ฏPM. Don't Flatten, Tokenize! - Spotlight presentation at #363. Neuroplastic Expansion - Poster presentation at #361. Donโ€™t miss it, go chat with amazing co-authors!๐Ÿฅณ
0
5
19
@bartoldson
Brian Bartoldson
8 months
๐Ÿš€ The code for LLM post-training with TBA is now available! Try out Trajectory Balance with Asynchrony via https://t.co/w173u63tHM. https://t.co/suu63c2nRs
@bartoldson
Brian Bartoldson
8 months
๐Ÿš€ We fixed a major LLM post-training bottleneck! Our new method (TBA) combines trajectory balance with asynchronous training to speed up LLM RL 5-50x while improving results+scalability. For example, using VinePPO's GSM8K setup, we obtain +1.2% accuracy and 50x faster RL.
0
7
26
@cihangxie
Cihang Xie@NeurIPS
8 months
๐ŸšจConcerned about visual jailbreaking attacks holding back Vision-Language Model (VLM) deployment? ๐ŸŒŸ Excited to announce our latest research: Double Visual Defense! TL;DR: We introduce ฮ”CLIP and ฮ”ยฒLLaVA โ€” the first to reconcile robust adversarial performance with
1
7
21
@cihangxie
Cihang Xie@NeurIPS
8 months
๐Ÿšจ Interested in adopting Large Reasoning Models (LRMs) but concerned about safety risks? ๐Ÿšจ Meet STAR-1 ๐ŸŒŸ โ€“ A compact, high-quality safety dataset (just 1K samples!) boosting LRMs' safety by 40% with only a minimal (~1.1%) reasoning drop! ๐Ÿš€ How we built STAR-1 in just 3
2
17
73
@fly51fly
fly51fly
8 months
[LG] Trajectory Balance with Asynchrony: Decoupling Exploration and Learning for Fast, Scalable LLM Post-Training B R. Bartoldson, S Venkatraman, J Diffenderfer, M Jain... [Lawrence Livermore National Laboratory & Mila] (2025) https://t.co/uRzhe3CdlS
0
10
35
@YangjunR
Yangjun Ruan
8 months
New paper on synthetic pretraining! We show LMs can synthesize their own thoughts for more data-efficient pretraining, bootstrapping their capabilities on limited, task-agnostic data. We call this new paradigm โ€œreasoning to learnโ€. https://t.co/Xd9sLKKVsl Hereโ€™s how it works๐Ÿงต
15
105
488
@bkailkhu
Bhavya Kailkhura
8 months
At @Livermore_Lab, we are using AI to: โš›๏ธ Solve nuclear fusion ๐Ÿงช Discover critical materials ๐Ÿง  Red-team vulnerabilities All to push science forward and protect national security ๐ŸŒŽ Post-training LLMs at scale can unlock these advances. But even with El Capitanโ€”the worldโ€™s
1
1
9
@gm8xx8
๐š๐”ช๐Ÿพ๐šก๐šก๐Ÿพ
8 months
Trajectory Balance with Asynchrony: Decoupling Exploration and Learning for Fast, Scalable LLM Post-Training TBA is a scalable RL system for LLM post-training that uses off-policy data and replay buffers with Trajectory Balance. It decouples training from search, improving speed
2
1
17