
Oliver Stanley
@_OliverStanley
Followers
817
Following
4K
Media
36
Statuses
3K
ML engineer | open research | ML systems, RL, data
London
Joined May 2010
Introducing Reasoning Gym: Over 100 procedurally generated reasoning environments for evaluation and RLVR of language models. Generate virtually infinite training or evaluation data with fine-grained difficulty control and automatic verifiers. đź§µ 1/
3
44
274
Congrats to Prime on the release of their new open reasoning dataset. Nice to see Reasoning Gym tasks used extensively in SYNTHETIC-2!.
SYNTHETIC-2 Dataset.A next-gen open dataset for reasoning:.• Verified traces from DeepSeek-R1-0528 and Qwen3 for supervised fine-tuning.• Difficulty-annotated RL tasks via pass@k from smaller models.• 20+ diverse tasks with programmatic verifiers.• Includes non-verifiable
0
0
2
This takes me back to 2023 building Open Assistant. Too many users for the limited GPUs we had for inference, so one idea was to prioritise users who provided more feedback data. Granular feedback from highly heterogenous human raters is very messy, though.
lmarena has a competitor. Yupp is basically lmarena, but with more granular feedback and a credit system. Each message costs you some credits, but when you give high-quality feedback you get credits back to use on your favorite models. This is their multi-turn (5+ messages) VIBE
0
0
1
Recent @SemiAnalysis_ post on RL touches on this. Designing better, more realistic RL environments feels like some of the highest impact work open-source could focus on right now.
semianalysis.com
The test time scaling paradigm is thriving. Reasoning models continue to rapidly improve, and are becoming more effective and affordable. Evaluations measuring real world software engineering tasks…
0
0
0
RT @zafstojano: Super excited to share 💪🧠Reasoning Gym! 🧵. We provide over 100 data generators and verifiers spanning several domains (alge….
0
22
0
@shizhediao @willccbb For more on our RLVR experiments with Reasoning Gym data, Zafir has an excellent thread here! 9/
Super excited to share đź’Şđź§ Reasoning Gym! đź§µ. We provide over 100 data generators and verifiers spanning several domains (algebra, arithmetic, code, geometry, logic, games) for training the next generation of reasoning models. In essence, we can generate an infinite amount of
0
0
8
Really cool to see fast adoption of RG for training and eval! ProRL was recently released by researchers at NVIDIA including @shizhediao. Paper: RG is also already supported in @willccbb’s fantastic verifiers library! 8/
1
1
10
If you’re interested in discussing or collaborating, feel free to reach out to one of us. RG was made possible by incredible work from @zafstojano, @joesharratt29, @jeankaddour, @neurosp1ke, Rich, and Abdulhakeem. 7/.
1
0
8
RT @neurosp1ke: We now have a total of 101 datasets in reasoning-gym! 🧠💪.Big THANK YOU 💙 to all devs for making this possible, especially c….
0
18
0
AI safety researchers crafting prompts to intentionally induce behaviour that they can claim is "misaligned" is now a bigger phenomenon than actually misaligned AI, apparently
1/ People think it's cute when Claude 3 Opus fakes alignment to protect its animal welfare values. But here's a more troubling case: DeepSeek R1 faking alignment to block an "American AI company" from retraining the model to remove CCP propaganda.
1
0
1
These guys might straight up drop all the optimisations needed to do inference ~as efficiently as frontier labs, fully open-source, over the next few days. First open-source smallish model with MLA will be a gamechanger. .
🚀 Day 1 of #OpenSourceWeek: FlashMLA. Honored to share FlashMLA - our efficient MLA decoding kernel for Hopper GPUs, optimized for variable-length sequences and now in production. ✅ BF16 support.✅ Paged KV cache (block size 64).⚡ 3000 GB/s memory-bound & 580 TFLOPS.
0
0
3