Richard Zhuang Profile
Richard Zhuang

@RichardZ412

Followers
636
Following
1K
Media
6
Statuses
196

CS @Stanford |Prev. @UCBerkeley @bespokelabsai | LLM Post-Training, Agents, Collective Intelligence

California, USA
Joined September 2021
Don't wanna be here? Send us removal request.
@RichardZ412
Richard Zhuang
18 days
OpenThoughts-Agent is LIVE! Fully open SFT + RL stack, new small-model SOTA on Terminal-Bench…and this is just V1. Super honored to be part of this cracked team and can’t wait to see how much further we can push this frontier together.
@NeginRaoof_
Negin Raoof
18 days
How can we make a better TerminalBench agent? Today, we are announcing the OpenThoughts-Agent project. OpenThoughts-Agent v1 is the first TerminalBench agent trained on fully open curated SFT and RL environments. OpenThinker-Agent-v1 is the strongest model of its size on
7
9
82
@etash_guha
Etash Guha
4 days
OpenThoughts-Agent dataset is trending on HuggingFace!
1
4
27
@LaudeInstitute
Laude Institute
5 days
Across three days at NeurIPS earlier this month, Laude Lounge became a space for open, working conversations about the future of open frontier AI. We just published a complete digital record of the Lounge, including full-length Laudecast interviews (featuring @JeffDean
1
5
27
@RichardZ412
Richard Zhuang
6 days
+1. My biggest regret at Berkeley is spending way too much time trying to maintain a 4.0 GPA (and my fellow bears would understand how unnecessarily painful that is). Looking back I definitely should’ve instead spend these time hanging out with friends, exploring/developing
@jaynitx
Jaynit
7 days
Andrej Karpathy literally revealed why "perfect grades" are a waste of precious time:
27
115
1K
@RichardZ412
Richard Zhuang
6 days
Imagine we have models post-trained on how to post-train itself better🥲
@maksym_andr
Maksym Andriushchenko
7 days
We release PostTrainBench: a benchmark measuring how well AI agents like Claude Code can post-train base LLMs. We expect this to be an important indicator for AI R&D automation as it unfolds over the next few years. 🔗 https://t.co/dVSSHkpAE1 📂 https://t.co/vqZNrQw66z 1/n
1
1
8
@RichardZ412
Richard Zhuang
8 days
Check out our release!
@LaudeInstitute
Laude Institute
9 days
The final night of Laude Lounge at NeurIPS 2025 focused on stack-level progress in open frontier AI, featuring: Michael Ryan, @DSPyOSS @etash_guha, @NeginRaoof_ , Ben Feuer, @ryanmart3n - OpenThoughts-Agent @LakshyAAAgrawal, GEPA @alexgshaw, Harbor @tyler_griggs_ , SkyRL
0
0
11
@xiangyue96
Xiang Yue
15 days
There are competing views on whether RL can genuinely improve base model's performance (e.g., pass@128). The answer is both yes and no, largely depending on the interplay between pre-training, mid-training, and RL. We trained a few hundreds of GPT-2 scale LMs on synthetic
28
239
1K
@RichardZ412
Richard Zhuang
16 days
Had so much fun this past week at #NeurIPS2025. Incredible food and view and met many fantastic people! Gotta switch back to grind mode but already missing San Diego😢
0
1
17
@AlexGDimakis
Alex Dimakis
18 days
@NeginRaoof_ And here is a picture from the OpenThoughts-Agent launch at the Laude lounge yesterday
0
2
12
@NeginRaoof_
Negin Raoof
18 days
How can we make a better TerminalBench agent? Today, we are announcing the OpenThoughts-Agent project. OpenThoughts-Agent v1 is the first TerminalBench agent trained on fully open curated SFT and RL environments. OpenThinker-Agent-v1 is the strongest model of its size on
17
74
278
@ryanmart3n
Ryan Marten
18 days
@NeginRaoof_ Release blog: https://t.co/AIyYsZltZk We will be releasing our progress here as we go!
Tweet card summary image
openthoughts.ai
Curating the best open agent datasets.
0
3
10
@RichardZ412
Richard Zhuang
19 days
🧑‍🍳🧑‍🍳
@NeginRaoof_
Negin Raoof
19 days
Prepping for the launch tonight 🤖
0
0
16
@RichardZ412
Richard Zhuang
25 days
I'll be at San Diego for #NeurIPS2025 from 12/2 to 12/7! Been working on some exciting research in post-training/reasoning/agents so I would love to chat about research (and summer internships!) in these areas. Also please let me know if there's any social events I totally
0
0
5
@RichardZ412
Richard Zhuang
27 days
Happy Thanksgiving my e-friends on X I’ve learned so much from your posts this past year🥹
@gabriel1
gabriel
28 days
remember to thank people who made you better and shaped you, parents, teachers, friends parents or whoever you are literally sitting on these genuine facts that might make change someones view of their entire life, and remove so much regret
0
0
5
@RichardZ412
Richard Zhuang
2 months
Agent Agent Agent
@alexgshaw
Alex Shaw
2 months
Today, we’re announcing the next chapter of Terminal-Bench with two releases: 1. Harbor, a new package for running sandboxed agent rollouts at scale 2. Terminal-Bench 2.0, a harder version of Terminal-Bench with increased verification
0
0
5
@RichardZ412
Richard Zhuang
2 months
Super cool benchmark!
@jyangballin
John Yang
2 months
New eval! Code duels for LMs ⚔️ Current evals test LMs on *tasks*: "fix this bug," "write a test" But we code to achieve *goals*: maximize revenue, cut costs, win users Meet CodeClash: LMs compete via their codebases across multi-round tournaments to achieve high-level goals
1
0
3
@RichardZ412
Richard Zhuang
2 months
Reading list ++
@_lewtun
Lewis Tunstall
2 months
We've just published the Smol Training Playbook: a distillation of hard earned knowledge to share exactly what it takes to train SOTA LLMs ⚡️ Featuring our protagonist SmolLM3, we cover: 🧭 Strategy on whether to train your own LLM and burn all your VC money 🪨 Pretraining,
0
0
1
@RichardZ412
Richard Zhuang
2 months
Best of both worlds hooray
@thinkymachines
Thinking Machines
2 months
Our latest post explores on-policy distillation, a training approach that unites the error-correcting relevance of RL with the reward density of SFT. When training it for math reasoning and as an internal chat assistant, we find that on-policy distillation can outperform other
1
0
1
@alexinexxx
alexine 🏴‍☠️
2 months
bringing back this banger for anyone still struggling with research papers https://t.co/zJyffsGXGZ
35
195
2K