Chase Blagden Profile
Chase Blagden

@ChaseBlagden

Followers
183
Following
845
Media
21
Statuses
170

Joined October 2023
Don't wanna be here? Send us removal request.
@yifan_zhang_
Yifan Zhang
12 days
https://t.co/IfcKBKkDnH The original GRPO work, which has been noted to be mathematically inconsistent (Zhang et al., 2025 ( https://t.co/H8FZPFLNIj); Tang et al., 2025 https://t.co/CtRhZaJON3)
Tweet card summary image
arxiv.org
We point out a few pitfalls in implementing gradient estimation for KL divergence in RL training for LLM, as seen in a number of open source projects and papers. The first major pitfall is to...
@neelsomani
Neel Somani
12 days
@rm_rafailov @kalomaze Elaborate? GRPO appears to be unprincipled but in what way is it incorrect
2
7
143
@rm_rafailov
Rafael Rafailov
12 days
@kalomaze You are wrong on like three different levels, but something that will blow your mind - GRPO was first published in the DPO paper under the name “PPO-ours” which was group size 4 (but our version was mathematically correct unlike the actual “GRPO”).
2
5
122
@ChaseBlagden
Chase Blagden
2 months
Open source adopting the tinker API and making it easy to spin up training engines e.g. `train serve ...` will be a huge unlock for research
@thinkymachines
Thinking Machines
2 months
Introducing Tinker: a flexible API for fine-tuning language models. Write training loops in Python on your laptop; we'll run them on distributed GPUs. Private beta starts today. We can't wait to see what researchers and developers build with cutting-edge open models!
1
2
6
@rm_rafailov
Rafael Rafailov
2 months
It’s weird how people still blindly copy it. There was a whole paper about this.
@QuanquanGu
Quanquan Gu
2 months
@zjasper666 The original GRPO is an off-policy RL algorithm, but its KL regularization isn't done right. Specifically, the k3 estimator for the unnormalized reverse KL is missing the importance weight. The correct formulation should be:
3
17
286
@ChaseBlagden
Chase Blagden
2 months
Another great writeup from @cHHillee. They share the kernels too! 🎉 https://t.co/gz9PQ7pNE3
Tweet card summary image
github.com
Contribute to thinking-machines-lab/batch_invariant_ops development by creating an account on GitHub.
@thinkymachines
Thinking Machines
2 months
Today Thinking Machines Lab is launching our research blog, Connectionism. Our first blog post is “Defeating Nondeterminism in LLM Inference” We believe that science is better when shared. Connectionism will cover topics as varied as our research is: from kernel numerics to
0
0
0
@ChaseBlagden
Chase Blagden
2 months
@synth_labs
SynthLabs
9 months
Releasing Big-MATH—the first heavily curated & verifiable dataset designed specifically for large-scale RL training & LLM reasoning! 📝 250,000+ problems, 47k NEW Q's ✅ 10x larger than existing datasets like MATH 🧑‍⚖️ Verifiable—we eliminated 400k+ problems Details below! 🧵👇
0
0
1
@ChaseBlagden
Chase Blagden
2 months
Great to see @synth_labs BigMath dataset used in @thinkymachines blog!
@thinkymachines
Thinking Machines
2 months
Today Thinking Machines Lab is launching our research blog, Connectionism. Our first blog post is “Defeating Nondeterminism in LLM Inference” We believe that science is better when shared. Connectionism will cover topics as varied as our research is: from kernel numerics to
1
0
4
@synth_labs
SynthLabs
5 months
Our new method (ALP) monitors solve rates across RL rollouts and applies inverse difficulty penalties during RL training. Result? Models learn an implicit difficulty estimator—allocating 5x more tokens to hard vs easy problems, cutting overall usage by 50% 🧵👇1/10
2
10
34
@ChaseBlagden
Chase Blagden
6 months
0
0
9
@ChaseBlagden
Chase Blagden
6 months
Thank you to @synth_labs and friends for making this possible!🥳
5
1
24
@NathanThinks
nathan lile
6 months
btw we have ongoing research on this front! we're open-science, pro-publication, and love collaboration. want to push this frontier forward? we're growing our SF team & always open to research partners—reach out, my DMs are open 📩
@NathanThinks
nathan lile
6 months
excellent work by @jaseweston & team—extending our "Generative Reward Models" work with RL (GRPO) to optimize LLM reasoning during judgment scalable (synthetic) evaluation continues to be AI's key bottleneck!
17
9
56
@NathanThinks
nathan lile
6 months
excellent work by @jaseweston & team—extending our "Generative Reward Models" work with RL (GRPO) to optimize LLM reasoning during judgment scalable (synthetic) evaluation continues to be AI's key bottleneck!
@jaseweston
Jason Weston
6 months
🚨 New paper 🚨 J1: Incentivizing Thinking in LLM-as-a-Judge via RL - Converts judgement task into a verifiable one for both verifiable and non-verifiable prompts. Uses only synthetic pairwise data - Optimizes thoughts, scores, and judgments using GRPO - Outperforms all
1
12
95
@_chotzen
Devin
6 months
Our first long-horizon agentic software engineering model is here! We've shipped a model that matches Claude on Cascade in a lot of ways. However the most exciting thing about this release is the trajectory we're on. So much left to do... we're hiring!
@windsurf
Windsurf
6 months
Wave 9 is here: a frontier model built for software engineering. Introducing our new family of models: SWE-1, SWE-1-lite, and SWE-1-mini. Based on internal evals, it has performance nearing that of frontier models from the foundation labs. Available now, only in Windsurf!
0
4
34
@ChaseBlagden
Chase Blagden
7 months
>What do you do? >RL Agents for <X>
0
0
2
@BTolooshams
Bahareh Tolooshams
7 months
We have released VARS-fUSI: Variable sampling for fast and efficient functional ultrasound imaging (fUSI) using neural operators. The first deep learning fUSI method to allow for different sampling durations and rates during training and inference. https://t.co/hHoWJozejz 1/
1
17
49
@superspeeg
Benjamin Spiegel
7 months
Why did only humans invent graphical systems like writing? 🧠✍️ In our new paper at @cogsci_soc, we explore how agents learn to communicate using a model of pictographic signification similar to human proto-writing. 🧵👇
23
186
1K
@ashertrockman
Asher Trockman
7 months
Are you a frontier lab investing untold sums in training? Are you trying to stay competitive? Are you finding that your competitors' models are ... thinking a bit too much like yours? Then https://t.co/qwVitSQK6o might be for you! @sama @elonmusk
5
30
141
@nebiusai
Nebius
8 months
Read how @synth_labs, a startup developing AI solutions tailored for logical reasoning, is advancing AI post-training with our @TractoAI: https://t.co/jePovolgcG 🔹 Goal: Develop an ML system that empowers reasoning models to surpass pattern matching and implement sophisticated
2
14
59
@AlbalakAlon
Alon Albalak
9 months
Happy to finally announce Big-MATH, the largest math reasoning dataset purposefully designed for large-scale RL! We worked tirelessly, cleaning and filtering math datasets so that you don't have to!
@synth_labs
SynthLabs
9 months
Releasing Big-MATH—the first heavily curated & verifiable dataset designed specifically for large-scale RL training & LLM reasoning! 📝 250,000+ problems, 47k NEW Q's ✅ 10x larger than existing datasets like MATH 🧑‍⚖️ Verifiable—we eliminated 400k+ problems Details below! 🧵👇
5
16
125
@synth_labs
SynthLabs
9 months
Start exploring Big-MATH today! 📄 Paper: https://t.co/U03ogBwu7Y 💻 Code: https://t.co/suwAiD8hTG 📂 Dataset:
Tweet card summary image
huggingface.co
0
2
6