SynthLabs @synth_labs X Profile

SynthLabs

@synth_labs

Followers

15K

Following

273

Media

30

Statuses

142

Scaling Up Good Synthetic Reasoning We're hiring! ➡️ https://t.co/Gqtk6KrIy6 / DM 💬 @NathanThinks

https://t.co/QgtFUHKqee

Joined May 2022

Don't wanna be here? Send us removal request.

SynthLabs

@synth_labs

8 months

Releasing Big-MATH—the first heavily curated & verifiable dataset designed specifically for large-scale RL training & LLM reasoning! 📝 250,000+ problems, 47k NEW Q's ✅ 10x larger than existing datasets like MATH 🧑‍⚖️ Verifiable—we eliminated 400k+ problems Details below! 🧵👇

5

19

143

nathan lile

@NathanThinks

29 days

we’re at #COLM2025🍁 come see our poster # 26 (session 1) today reach out ✉️ if you'd like to chat!

nathan lile

@NathanThinks

8 months

Qwen+RL = dramatic, Aha! Llama+RL = quick plateau Same size. Same RL. Why? Qwen naturally exhibits cognitive behaviors that Llama doesn't Prime Llama with 4 synthetic reasoning patterns & it matched Qwen's self-improvement performance! We can engineer this into any model! 👇

1

7

47

TractoAI

@tractoai

4 months

the future is about smart tokens

nathan lile

@NathanThinks

4 months

What if models could learn which problems _deserve_ deep thinking? No labels. Just let the model discover difficulty through its own performance during training. Instead of burning compute 🔥💸 on trivial problems, it allocates 5x more on problems that actually need it ↓

0

3

6

nathan lile

@NathanThinks

4 months

What if models could learn which problems _deserve_ deep thinking? No labels. Just let the model discover difficulty through its own performance during training. Instead of burning compute 🔥💸 on trivial problems, it allocates 5x more on problems that actually need it ↓

SynthLabs

@synth_labs

4 months

Our new method (ALP) monitors solve rates across RL rollouts and applies inverse difficulty penalties during RL training. Result? Models learn an implicit difficulty estimator—allocating 5x more tokens to hard vs easy problems, cutting overall usage by 50% 🧵👇1/10

1

7

38

SynthLabs

@synth_labs

4 months

10/10 Bottom line: ALP teaches models to think harder on hard problems and think less on easy ones - exactly what we want for efficient reasoning! 📄 Paper: https://t.co/XAqTaWsGBw 🔧 Models:

huggingface.co

0

4

SynthLabs

@synth_labs

4 months

9/10 Shoutout to the amazing contributors from @synth_labs & @Stanford: @ZiyuX @ChaseBlagden @rm_rafailov @NathanThinks @sangttruong @chelseabfinn @nickhaber And to @tractoai @nebiusai for providing R&D compute!

2

0

6

SynthLabs

@synth_labs

4 months

8/10 Compared to other methods, ALP does not require specifying token budgets for different problem difficulties and adapts with the model performance online. Unlike L1 (requires user budgets), ThinkPrune (uniform compression), or R1-Alpha (only penalizes correct solutions), ALP

1

0

4

SynthLabs

@synth_labs

4 months

7/10 Where do these savings come from? Behavioral analysis reveals how models compress: ALP eliminates redundant exploration (-48%) and repetitive verification (-66%), but preserves structured planning.

1

0

4

SynthLabs

@synth_labs

4 months

6/10 We tested robustness: What happens when 60% of problems are competition-level hard? ALP gracefully scales up computation while fixed-budget methods fail. It automatically adapts to the actual difficulty distribution it encounters!

1

0

4

SynthLabs

@synth_labs

4 months

5/10 ALP doesn't just compress uniformly—it learns sophisticated resource allocation: The average token usage drops 50% and it uses only 21% of token budget on the easiest 50% of problems, allocating 5x more tokens on hard problems compared to easy ones.

1

0

4

SynthLabs

@synth_labs

4 months

4/10 Results on DeepScaleR-1.5B are impressive: • 50% reduction in average token usage • Performance maintained • Zero additional training cost (leverages existing RL rollouts!) ⚡

1

0

4

SynthLabs

@synth_labs

4 months

3/10 ALP monitors each prompt's solve rate across multiple rollouts and applies inversely scaled penalties. No manual configuration needed - the model learns to allocate "just enough" computation automatically! 🎯

1

0

5

SynthLabs

@synth_labs

4 months

2/10 The problem: Current reasoning models waste massive compute on easy problems while potentially under-thinking hard ones. Existing solutions? Either require manual user configuration or treat all problems the same.

1

0

4

SynthLabs

@synth_labs

4 months

Our new method (ALP) monitors solve rates across RL rollouts and applies inverse difficulty penalties during RL training. Result? Models learn an implicit difficulty estimator—allocating 5x more tokens to hard vs easy problems, cutting overall usage by 50% 🧵👇1/10

2

10

34

nathan lile

@NathanThinks

5 months

Generative Reward Models impact compounds daily. way stronger interest now than when we published last fall 👇 many excellent recent extensions—cool seeing where researchers take GenRM

nathan lile

@NathanThinks

1 year

we bootstrapped our way to generalized meta-reasoning capabilities with generative reward models classical reward models can be worse than random on new reasoning tasks 🎲 we see improvements in robustness, generalization, interpretability and an opportunity to unify RLHF/RLAIF

1

3

19

nathan lile

@NathanThinks

6 months

btw we have ongoing research on this front! we're open-science, pro-publication, and love collaboration. want to push this frontier forward? we're growing our SF team & always open to research partners—reach out, my DMs are open 📩

nathan lile

@NathanThinks

6 months

excellent work by @jaseweston & team—extending our "Generative Reward Models" work with RL (GRPO) to optimize LLM reasoning during judgment scalable (synthetic) evaluation continues to be AI's key bottleneck!

17

9

56

Nebius

@nebiusai

7 months

Read how @synth_labs, a startup developing AI solutions tailored for logical reasoning, is advancing AI post-training with our @TractoAI: https://t.co/jePovolgcG 🔹 Goal: Develop an ML system that empowers reasoning models to surpass pattern matching and implement sophisticated

2

14

59

nathan lile

@NathanThinks

8 months

btw, random fun fact we pointed out months ago: the only MATH example @OpenAI published with o1 announcement included an unsubstantiated assumption 😬

nathan lile

@NathanThinks

8 months

> still hacks at a fairly high rate > we wouldn't notice this agent was misaligned Meanwhile, industry: aggressively distilling Meta-CoT slop directly into models 🫡

1

6

33

Nebius

@nebiusai

8 months

The final stop in our meetup series will be in San Francisco! 🌁 https://t.co/kK0d0Oszux Join us at Convene 100 Stockton near Union Square on Thursday, March 13, for a deep dive into our AI cloud. Our developers, AI R&D engineers and architects will share insights with the tech

1

4

31

The AI Timeline

@TheAITimeline

8 months

Big-Math: A Large-Scale, High-Quality Math Dataset for Reinforcement Learning in Language Models Author's Explanation: https://t.co/ISZN8R5GWX Overview: Big-Math, a dataset of over 250,000 high-quality math questions with verifiable answers, is purposefully designed for

SynthLabs

@synth_labs

8 months

Releasing Big-MATH—the first heavily curated & verifiable dataset designed specifically for large-scale RL training & LLM reasoning! 📝 250,000+ problems, 47k NEW Q's ✅ 10x larger than existing datasets like MATH 🧑‍⚖️ Verifiable—we eliminated 400k+ problems Details below! 🧵👇

1

2

7

Daniel van Strien

@vanstriendaniel

8 months

Big-Math: Big-Math: Massive Math Dataset for RL Training - 10x larger than GSM8k/MATH - 3 core properties: uniquely verifiable, open-ended, closed-form - Human-validated 90%+ precision filters - Difficulty metrics for curriculum learning

2

26

133