Chase Blagden @ChaseBlagden X Profile

Chase Blagden

@ChaseBlagden

Followers

183

Following

845

Media

21

Statuses

170

Joined October 2023

Don't wanna be here? Send us removal request.

Yifan Zhang

@yifan_zhang_

12 days

https://t.co/IfcKBKkDnH The original GRPO work, which has been noted to be mathematically inconsistent (Zhang et al., 2025 ( https://t.co/H8FZPFLNIj); Tang et al., 2025 https://t.co/CtRhZaJON3)

arxiv.org

We point out a few pitfalls in implementing gradient estimation for KL divergence in RL training for LLM, as seen in a number of open source projects and papers. The first major pitfall is to...

Neel Somani

@neelsomani

12 days

@rm_rafailov @kalomaze Elaborate? GRPO appears to be unprincipled but in what way is it incorrect

2

7

143

Rafael Rafailov

@rm_rafailov

12 days

@kalomaze You are wrong on like three different levels, but something that will blow your mind - GRPO was first published in the DPO paper under the name “PPO-ours” which was group size 4 (but our version was mathematically correct unlike the actual “GRPO”).

2

5

122

Chase Blagden

@ChaseBlagden

2 months

Open source adopting the tinker API and making it easy to spin up training engines e.g. `train serve ...` will be a huge unlock for research

Thinking Machines

@thinkymachines

2 months

Introducing Tinker: a flexible API for fine-tuning language models. Write training loops in Python on your laptop; we'll run them on distributed GPUs. Private beta starts today. We can't wait to see what researchers and developers build with cutting-edge open models!

1

2

6

Rafael Rafailov

@rm_rafailov

2 months

It’s weird how people still blindly copy it. There was a whole paper about this.

Quanquan Gu

@QuanquanGu

2 months

@zjasper666 The original GRPO is an off-policy RL algorithm, but its KL regularization isn't done right. Specifically, the k3 estimator for the unnormalized reverse KL is missing the importance weight. The correct formulation should be:

3

17

286

Chase Blagden

@ChaseBlagden

2 months

Another great writeup from @cHHillee. They share the kernels too! 🎉 https://t.co/gz9PQ7pNE3

github.com

Contribute to thinking-machines-lab/batch_invariant_ops development by creating an account on GitHub.

Thinking Machines

@thinkymachines

2 months

Today Thinking Machines Lab is launching our research blog, Connectionism. Our first blog post is “Defeating Nondeterminism in LLM Inference” We believe that science is better when shared. Connectionism will cover topics as varied as our research is: from kernel numerics to

0

Chase Blagden

@ChaseBlagden

2 months

@synth_labs @thinkymachines

SynthLabs

@synth_labs

9 months

Releasing Big-MATH—the first heavily curated & verifiable dataset designed specifically for large-scale RL training & LLM reasoning! 📝 250,000+ problems, 47k NEW Q's ✅ 10x larger than existing datasets like MATH 🧑‍⚖️ Verifiable—we eliminated 400k+ problems Details below! 🧵👇

0

1

Chase Blagden

@ChaseBlagden

2 months

Great to see @synth_labs BigMath dataset used in @thinkymachines blog!

Thinking Machines

@thinkymachines

2 months

Today Thinking Machines Lab is launching our research blog, Connectionism. Our first blog post is “Defeating Nondeterminism in LLM Inference” We believe that science is better when shared. Connectionism will cover topics as varied as our research is: from kernel numerics to

1

0

4

SynthLabs

@synth_labs

5 months

Our new method (ALP) monitors solve rates across RL rollouts and applies inverse difficulty penalties during RL training. Result? Models learn an implicit difficulty estimator—allocating 5x more tokens to hard vs easy problems, cutting overall usage by 50% 🧵👇1/10

2

10

34

Chase Blagden

@ChaseBlagden

6 months

@synth_labs Special thanks to @rm_rafailov @lcastricato @ZiyuX @NathanThinks

0

9

Chase Blagden

@ChaseBlagden

6 months

Thank you to @synth_labs and friends for making this possible!🥳

5

1

24

nathan lile

@NathanThinks

6 months

btw we have ongoing research on this front! we're open-science, pro-publication, and love collaboration. want to push this frontier forward? we're growing our SF team & always open to research partners—reach out, my DMs are open 📩

nathan lile

@NathanThinks

6 months

excellent work by @jaseweston & team—extending our "Generative Reward Models" work with RL (GRPO) to optimize LLM reasoning during judgment scalable (synthetic) evaluation continues to be AI's key bottleneck!

17

9

56

nathan lile

@NathanThinks

6 months

excellent work by @jaseweston & team—extending our "Generative Reward Models" work with RL (GRPO) to optimize LLM reasoning during judgment scalable (synthetic) evaluation continues to be AI's key bottleneck!

Jason Weston

@jaseweston

6 months

🚨 New paper 🚨 J1: Incentivizing Thinking in LLM-as-a-Judge via RL - Converts judgement task into a verifiable one for both verifiable and non-verifiable prompts. Uses only synthetic pairwise data - Optimizes thoughts, scores, and judgments using GRPO - Outperforms all

1

12

95

Devin

@_chotzen

6 months

Our first long-horizon agentic software engineering model is here! We've shipped a model that matches Claude on Cascade in a lot of ways. However the most exciting thing about this release is the trajectory we're on. So much left to do... we're hiring!

Windsurf

@windsurf

6 months

Wave 9 is here: a frontier model built for software engineering. Introducing our new family of models: SWE-1, SWE-1-lite, and SWE-1-mini. Based on internal evals, it has performance nearing that of frontier models from the foundation labs. Available now, only in Windsurf!

0

4

34

Chase Blagden

@ChaseBlagden

7 months

>What do you do? >RL Agents for <X>

0

2

Bahareh Tolooshams

@BTolooshams

7 months

We have released VARS-fUSI: Variable sampling for fast and efficient functional ultrasound imaging (fUSI) using neural operators. The first deep learning fUSI method to allow for different sampling durations and rates during training and inference. https://t.co/hHoWJozejz 1/

1

17

49

Benjamin Spiegel

@superspeeg

7 months

Why did only humans invent graphical systems like writing? 🧠✍️ In our new paper at @cogsci_soc, we explore how agents learn to communicate using a model of pictographic signification similar to human proto-writing. 🧵👇

23

186

1K

Asher Trockman

@ashertrockman

7 months

Are you a frontier lab investing untold sums in training? Are you trying to stay competitive? Are you finding that your competitors' models are ... thinking a bit too much like yours? Then https://t.co/qwVitSQK6o might be for you! @sama @elonmusk

5

30

141

Nebius

@nebiusai

8 months

Read how @synth_labs, a startup developing AI solutions tailored for logical reasoning, is advancing AI post-training with our @TractoAI: https://t.co/jePovolgcG 🔹 Goal: Develop an ML system that empowers reasoning models to surpass pattern matching and implement sophisticated

2

14

59

Alon Albalak

@AlbalakAlon

9 months

Happy to finally announce Big-MATH, the largest math reasoning dataset purposefully designed for large-scale RL! We worked tirelessly, cleaning and filtering math datasets so that you don't have to!

SynthLabs

@synth_labs

9 months

Releasing Big-MATH—the first heavily curated & verifiable dataset designed specifically for large-scale RL training & LLM reasoning! 📝 250,000+ problems, 47k NEW Q's ✅ 10x larger than existing datasets like MATH 🧑‍⚖️ Verifiable—we eliminated 400k+ problems Details below! 🧵👇

5

16

125

SynthLabs

@synth_labs

9 months

Start exploring Big-MATH today! 📄 Paper: https://t.co/U03ogBwu7Y 💻 Code: https://t.co/suwAiD8hTG 📂 Dataset:

huggingface.co

0

2

6