Chase Blagden
@ChaseBlagden
Followers
183
Following
845
Media
21
Statuses
170
Joined October 2023
https://t.co/IfcKBKkDnH The original GRPO work, which has been noted to be mathematically inconsistent (Zhang et al., 2025 ( https://t.co/H8FZPFLNIj); Tang et al., 2025 https://t.co/CtRhZaJON3)
arxiv.org
We point out a few pitfalls in implementing gradient estimation for KL divergence in RL training for LLM, as seen in a number of open source projects and papers. The first major pitfall is to...
@rm_rafailov @kalomaze Elaborate? GRPO appears to be unprincipled but in what way is it incorrect
2
7
143
@kalomaze You are wrong on like three different levels, but something that will blow your mind - GRPO was first published in the DPO paper under the name “PPO-ours” which was group size 4 (but our version was mathematically correct unlike the actual “GRPO”).
2
5
122
Open source adopting the tinker API and making it easy to spin up training engines e.g. `train serve ...` will be a huge unlock for research
Introducing Tinker: a flexible API for fine-tuning language models. Write training loops in Python on your laptop; we'll run them on distributed GPUs. Private beta starts today. We can't wait to see what researchers and developers build with cutting-edge open models!
1
2
6
It’s weird how people still blindly copy it. There was a whole paper about this.
@zjasper666 The original GRPO is an off-policy RL algorithm, but its KL regularization isn't done right. Specifically, the k3 estimator for the unnormalized reverse KL is missing the importance weight. The correct formulation should be:
3
17
286
Another great writeup from @cHHillee. They share the kernels too! 🎉 https://t.co/gz9PQ7pNE3
github.com
Contribute to thinking-machines-lab/batch_invariant_ops development by creating an account on GitHub.
Today Thinking Machines Lab is launching our research blog, Connectionism. Our first blog post is “Defeating Nondeterminism in LLM Inference” We believe that science is better when shared. Connectionism will cover topics as varied as our research is: from kernel numerics to
0
0
0
Great to see @synth_labs BigMath dataset used in @thinkymachines blog!
Today Thinking Machines Lab is launching our research blog, Connectionism. Our first blog post is “Defeating Nondeterminism in LLM Inference” We believe that science is better when shared. Connectionism will cover topics as varied as our research is: from kernel numerics to
1
0
4
Our new method (ALP) monitors solve rates across RL rollouts and applies inverse difficulty penalties during RL training. Result? Models learn an implicit difficulty estimator—allocating 5x more tokens to hard vs easy problems, cutting overall usage by 50% 🧵👇1/10
2
10
34
0
0
9
btw we have ongoing research on this front! we're open-science, pro-publication, and love collaboration. want to push this frontier forward? we're growing our SF team & always open to research partners—reach out, my DMs are open 📩
excellent work by @jaseweston & team—extending our "Generative Reward Models" work with RL (GRPO) to optimize LLM reasoning during judgment scalable (synthetic) evaluation continues to be AI's key bottleneck!
17
9
56
excellent work by @jaseweston & team—extending our "Generative Reward Models" work with RL (GRPO) to optimize LLM reasoning during judgment scalable (synthetic) evaluation continues to be AI's key bottleneck!
🚨 New paper 🚨 J1: Incentivizing Thinking in LLM-as-a-Judge via RL - Converts judgement task into a verifiable one for both verifiable and non-verifiable prompts. Uses only synthetic pairwise data - Optimizes thoughts, scores, and judgments using GRPO - Outperforms all
1
12
95
Our first long-horizon agentic software engineering model is here! We've shipped a model that matches Claude on Cascade in a lot of ways. However the most exciting thing about this release is the trajectory we're on. So much left to do... we're hiring!
Wave 9 is here: a frontier model built for software engineering. Introducing our new family of models: SWE-1, SWE-1-lite, and SWE-1-mini. Based on internal evals, it has performance nearing that of frontier models from the foundation labs. Available now, only in Windsurf!
0
4
34
We have released VARS-fUSI: Variable sampling for fast and efficient functional ultrasound imaging (fUSI) using neural operators. The first deep learning fUSI method to allow for different sampling durations and rates during training and inference. https://t.co/hHoWJozejz 1/
1
17
49
Why did only humans invent graphical systems like writing? 🧠✍️ In our new paper at @cogsci_soc, we explore how agents learn to communicate using a model of pictographic signification similar to human proto-writing. 🧵👇
23
186
1K
Are you a frontier lab investing untold sums in training? Are you trying to stay competitive? Are you finding that your competitors' models are ... thinking a bit too much like yours? Then https://t.co/qwVitSQK6o might be for you! @sama @elonmusk
5
30
141
Read how @synth_labs, a startup developing AI solutions tailored for logical reasoning, is advancing AI post-training with our @TractoAI: https://t.co/jePovolgcG 🔹 Goal: Develop an ML system that empowers reasoning models to surpass pattern matching and implement sophisticated
2
14
59
Happy to finally announce Big-MATH, the largest math reasoning dataset purposefully designed for large-scale RL! We worked tirelessly, cleaning and filtering math datasets so that you don't have to!
Releasing Big-MATH—the first heavily curated & verifiable dataset designed specifically for large-scale RL training & LLM reasoning! 📝 250,000+ problems, 47k NEW Q's ✅ 10x larger than existing datasets like MATH 🧑⚖️ Verifiable—we eliminated 400k+ problems Details below! 🧵👇
5
16
125
Start exploring Big-MATH today! 📄 Paper: https://t.co/U03ogBwu7Y 💻 Code: https://t.co/suwAiD8hTG 📂 Dataset:
huggingface.co
0
2
6