SynthLabs
@synth_labs
Followers
15K
Following
273
Media
30
Statuses
142
Scaling Up Good Synthetic Reasoning We're hiring! ➡️ https://t.co/Gqtk6KrIy6 / DM 💬 @NathanThinks
Joined May 2022
Releasing Big-MATH—the first heavily curated & verifiable dataset designed specifically for large-scale RL training & LLM reasoning! 📝 250,000+ problems, 47k NEW Q's ✅ 10x larger than existing datasets like MATH 🧑⚖️ Verifiable—we eliminated 400k+ problems Details below! 🧵👇
5
19
143
we’re at #COLM2025🍁 come see our poster # 26 (session 1) today reach out ✉️ if you'd like to chat!
Qwen+RL = dramatic, Aha! Llama+RL = quick plateau Same size. Same RL. Why? Qwen naturally exhibits cognitive behaviors that Llama doesn't Prime Llama with 4 synthetic reasoning patterns & it matched Qwen's self-improvement performance! We can engineer this into any model! 👇
1
7
47
the future is about smart tokens
What if models could learn which problems _deserve_ deep thinking? No labels. Just let the model discover difficulty through its own performance during training. Instead of burning compute 🔥💸 on trivial problems, it allocates 5x more on problems that actually need it ↓
0
3
6
What if models could learn which problems _deserve_ deep thinking? No labels. Just let the model discover difficulty through its own performance during training. Instead of burning compute 🔥💸 on trivial problems, it allocates 5x more on problems that actually need it ↓
Our new method (ALP) monitors solve rates across RL rollouts and applies inverse difficulty penalties during RL training. Result? Models learn an implicit difficulty estimator—allocating 5x more tokens to hard vs easy problems, cutting overall usage by 50% 🧵👇1/10
1
7
38
10/10 Bottom line: ALP teaches models to think harder on hard problems and think less on easy ones - exactly what we want for efficient reasoning! 📄 Paper: https://t.co/XAqTaWsGBw 🔧 Models:
huggingface.co
0
0
4
9/10 Shoutout to the amazing contributors from @synth_labs & @Stanford: @ZiyuX @ChaseBlagden @rm_rafailov @NathanThinks @sangttruong @chelseabfinn @nickhaber And to @tractoai @nebiusai for providing R&D compute!
2
0
6
8/10 Compared to other methods, ALP does not require specifying token budgets for different problem difficulties and adapts with the model performance online. Unlike L1 (requires user budgets), ThinkPrune (uniform compression), or R1-Alpha (only penalizes correct solutions), ALP
1
0
4
7/10 Where do these savings come from? Behavioral analysis reveals how models compress: ALP eliminates redundant exploration (-48%) and repetitive verification (-66%), but preserves structured planning.
1
0
4
6/10 We tested robustness: What happens when 60% of problems are competition-level hard? ALP gracefully scales up computation while fixed-budget methods fail. It automatically adapts to the actual difficulty distribution it encounters!
1
0
4
5/10 ALP doesn't just compress uniformly—it learns sophisticated resource allocation: The average token usage drops 50% and it uses only 21% of token budget on the easiest 50% of problems, allocating 5x more tokens on hard problems compared to easy ones.
1
0
4
4/10 Results on DeepScaleR-1.5B are impressive: • 50% reduction in average token usage • Performance maintained • Zero additional training cost (leverages existing RL rollouts!) ⚡
1
0
4
3/10 ALP monitors each prompt's solve rate across multiple rollouts and applies inversely scaled penalties. No manual configuration needed - the model learns to allocate "just enough" computation automatically! 🎯
1
0
5
2/10 The problem: Current reasoning models waste massive compute on easy problems while potentially under-thinking hard ones. Existing solutions? Either require manual user configuration or treat all problems the same.
1
0
4
Our new method (ALP) monitors solve rates across RL rollouts and applies inverse difficulty penalties during RL training. Result? Models learn an implicit difficulty estimator—allocating 5x more tokens to hard vs easy problems, cutting overall usage by 50% 🧵👇1/10
2
10
34
Generative Reward Models impact compounds daily. way stronger interest now than when we published last fall 👇 many excellent recent extensions—cool seeing where researchers take GenRM
we bootstrapped our way to generalized meta-reasoning capabilities with generative reward models classical reward models can be worse than random on new reasoning tasks 🎲 we see improvements in robustness, generalization, interpretability and an opportunity to unify RLHF/RLAIF
1
3
19
btw we have ongoing research on this front! we're open-science, pro-publication, and love collaboration. want to push this frontier forward? we're growing our SF team & always open to research partners—reach out, my DMs are open 📩
excellent work by @jaseweston & team—extending our "Generative Reward Models" work with RL (GRPO) to optimize LLM reasoning during judgment scalable (synthetic) evaluation continues to be AI's key bottleneck!
17
9
56
Read how @synth_labs, a startup developing AI solutions tailored for logical reasoning, is advancing AI post-training with our @TractoAI: https://t.co/jePovolgcG 🔹 Goal: Develop an ML system that empowers reasoning models to surpass pattern matching and implement sophisticated
2
14
59
btw, random fun fact we pointed out months ago: the only MATH example @OpenAI published with o1 announcement included an unsubstantiated assumption 😬
> still hacks at a fairly high rate > we wouldn't notice this agent was misaligned Meanwhile, industry: aggressively distilling Meta-CoT slop directly into models 🫡
1
6
33
The final stop in our meetup series will be in San Francisco! 🌁 https://t.co/kK0d0Oszux Join us at Convene 100 Stockton near Union Square on Thursday, March 13, for a deep dive into our AI cloud. Our developers, AI R&D engineers and architects will share insights with the tech
1
4
31
Big-Math: A Large-Scale, High-Quality Math Dataset for Reinforcement Learning in Language Models Author's Explanation: https://t.co/ISZN8R5GWX Overview: Big-Math, a dataset of over 250,000 high-quality math questions with verifiable answers, is purposefully designed for
Releasing Big-MATH—the first heavily curated & verifiable dataset designed specifically for large-scale RL training & LLM reasoning! 📝 250,000+ problems, 47k NEW Q's ✅ 10x larger than existing datasets like MATH 🧑⚖️ Verifiable—we eliminated 400k+ problems Details below! 🧵👇
1
2
7
Big-Math: Big-Math: Massive Math Dataset for RL Training - 10x larger than GSM8k/MATH - 3 core properties: uniquely verifiable, open-ended, closed-form - Human-validated 90%+ precision filters - Difficulty metrics for curriculum learning
2
26
133