synth_labs Profile Banner
SynthLabs Profile
SynthLabs

@synth_labs

Followers
15K
Following
273
Media
30
Statuses
142

Scaling Up Good Synthetic Reasoning We're hiring! ➡️ https://t.co/Gqtk6KrIy6 / DM 💬 @NathanThinks

Joined May 2022
Don't wanna be here? Send us removal request.
@synth_labs
SynthLabs
8 months
Releasing Big-MATH—the first heavily curated & verifiable dataset designed specifically for large-scale RL training & LLM reasoning! 📝 250,000+ problems, 47k NEW Q's ✅ 10x larger than existing datasets like MATH 🧑‍⚖️ Verifiable—we eliminated 400k+ problems Details below! 🧵👇
5
19
143
@NathanThinks
nathan lile
29 days
we’re at #COLM2025🍁 come see our poster # 26 (session 1) today reach out ✉️ if you'd like to chat!
@NathanThinks
nathan lile
8 months
Qwen+RL = dramatic, Aha! Llama+RL = quick plateau Same size. Same RL. Why? Qwen naturally exhibits cognitive behaviors that Llama doesn't Prime Llama with 4 synthetic reasoning patterns & it matched Qwen's self-improvement performance! We can engineer this into any model! 👇
1
7
47
@tractoai
TractoAI
4 months
the future is about smart tokens
@NathanThinks
nathan lile
4 months
What if models could learn which problems _deserve_ deep thinking? No labels. Just let the model discover difficulty through its own performance during training. Instead of burning compute 🔥💸 on trivial problems, it allocates 5x more on problems that actually need it ↓
0
3
6
@NathanThinks
nathan lile
4 months
What if models could learn which problems _deserve_ deep thinking? No labels. Just let the model discover difficulty through its own performance during training. Instead of burning compute 🔥💸 on trivial problems, it allocates 5x more on problems that actually need it ↓
@synth_labs
SynthLabs
4 months
Our new method (ALP) monitors solve rates across RL rollouts and applies inverse difficulty penalties during RL training. Result? Models learn an implicit difficulty estimator—allocating 5x more tokens to hard vs easy problems, cutting overall usage by 50% 🧵👇1/10
1
7
38
@synth_labs
SynthLabs
4 months
10/10 Bottom line: ALP teaches models to think harder on hard problems and think less on easy ones - exactly what we want for efficient reasoning! 📄 Paper: https://t.co/XAqTaWsGBw 🔧 Models:
Tweet card summary image
huggingface.co
0
0
4
@synth_labs
SynthLabs
4 months
9/10 Shoutout to the amazing contributors from @synth_labs & @Stanford: @ZiyuX @ChaseBlagden @rm_rafailov @NathanThinks @sangttruong @chelseabfinn @nickhaber And to @tractoai @nebiusai for providing R&D compute!
2
0
6
@synth_labs
SynthLabs
4 months
8/10 Compared to other methods, ALP does not require specifying token budgets for different problem difficulties and adapts with the model performance online. Unlike L1 (requires user budgets), ThinkPrune (uniform compression), or R1-Alpha (only penalizes correct solutions), ALP
1
0
4
@synth_labs
SynthLabs
4 months
7/10 Where do these savings come from? Behavioral analysis reveals how models compress: ALP eliminates redundant exploration (-48%) and repetitive verification (-66%), but preserves structured planning.
1
0
4
@synth_labs
SynthLabs
4 months
6/10 We tested robustness: What happens when 60% of problems are competition-level hard? ALP gracefully scales up computation while fixed-budget methods fail. It automatically adapts to the actual difficulty distribution it encounters!
1
0
4
@synth_labs
SynthLabs
4 months
5/10 ALP doesn't just compress uniformly—it learns sophisticated resource allocation: The average token usage drops 50% and it uses only 21% of token budget on the easiest 50% of problems, allocating 5x more tokens on hard problems compared to easy ones.
1
0
4
@synth_labs
SynthLabs
4 months
4/10 Results on DeepScaleR-1.5B are impressive: • 50% reduction in average token usage • Performance maintained • Zero additional training cost (leverages existing RL rollouts!) ⚡
1
0
4
@synth_labs
SynthLabs
4 months
3/10 ALP monitors each prompt's solve rate across multiple rollouts and applies inversely scaled penalties. No manual configuration needed - the model learns to allocate "just enough" computation automatically! 🎯
1
0
5
@synth_labs
SynthLabs
4 months
2/10 The problem: Current reasoning models waste massive compute on easy problems while potentially under-thinking hard ones. Existing solutions? Either require manual user configuration or treat all problems the same.
1
0
4
@synth_labs
SynthLabs
4 months
Our new method (ALP) monitors solve rates across RL rollouts and applies inverse difficulty penalties during RL training. Result? Models learn an implicit difficulty estimator—allocating 5x more tokens to hard vs easy problems, cutting overall usage by 50% 🧵👇1/10
2
10
34
@NathanThinks
nathan lile
5 months
Generative Reward Models impact compounds daily. way stronger interest now than when we published last fall 👇 many excellent recent extensions—cool seeing where researchers take GenRM
@NathanThinks
nathan lile
1 year
we bootstrapped our way to generalized meta-reasoning capabilities with generative reward models classical reward models can be worse than random on new reasoning tasks 🎲 we see improvements in robustness, generalization, interpretability and an opportunity to unify RLHF/RLAIF
1
3
19
@NathanThinks
nathan lile
6 months
btw we have ongoing research on this front! we're open-science, pro-publication, and love collaboration. want to push this frontier forward? we're growing our SF team & always open to research partners—reach out, my DMs are open 📩
@NathanThinks
nathan lile
6 months
excellent work by @jaseweston & team—extending our "Generative Reward Models" work with RL (GRPO) to optimize LLM reasoning during judgment scalable (synthetic) evaluation continues to be AI's key bottleneck!
17
9
56
@nebiusai
Nebius
7 months
Read how @synth_labs, a startup developing AI solutions tailored for logical reasoning, is advancing AI post-training with our @TractoAI: https://t.co/jePovolgcG 🔹 Goal: Develop an ML system that empowers reasoning models to surpass pattern matching and implement sophisticated
2
14
59
@NathanThinks
nathan lile
8 months
btw, random fun fact we pointed out months ago: the only MATH example @OpenAI published with o1 announcement included an unsubstantiated assumption 😬
@NathanThinks
nathan lile
8 months
> still hacks at a fairly high rate > we wouldn't notice this agent was misaligned Meanwhile, industry: aggressively distilling Meta-CoT slop directly into models 🫡
1
6
33
@nebiusai
Nebius
8 months
The final stop in our meetup series will be in San Francisco! 🌁 https://t.co/kK0d0Oszux Join us at Convene 100 Stockton near Union Square on Thursday, March 13, for a deep dive into our AI cloud. Our developers, AI R&D engineers and architects will share insights with the tech
1
4
31
@TheAITimeline
The AI Timeline
8 months
Big-Math: A Large-Scale, High-Quality Math Dataset for Reinforcement Learning in Language Models Author's Explanation: https://t.co/ISZN8R5GWX Overview: Big-Math, a dataset of over 250,000 high-quality math questions with verifiable answers, is purposefully designed for
@synth_labs
SynthLabs
8 months
Releasing Big-MATH—the first heavily curated & verifiable dataset designed specifically for large-scale RL training & LLM reasoning! 📝 250,000+ problems, 47k NEW Q's ✅ 10x larger than existing datasets like MATH 🧑‍⚖️ Verifiable—we eliminated 400k+ problems Details below! 🧵👇
1
2
7
@vanstriendaniel
Daniel van Strien
8 months
Big-Math: Big-Math: Massive Math Dataset for RL Training - 10x larger than GSM8k/MATH - 3 core properties: uniquely verifiable, open-ended, closed-form - Human-validated 90%+ precision filters - Difficulty metrics for curriculum learning
2
26
133