Richard Zhuang
@RichardZ412
Followers
636
Following
1K
Media
6
Statuses
196
CS @Stanford |Prev. @UCBerkeley @bespokelabsai | LLM Post-Training, Agents, Collective Intelligence
California, USA
Joined September 2021
OpenThoughts-Agent is LIVE! Fully open SFT + RL stack, new small-model SOTA on Terminal-Bench…and this is just V1. Super honored to be part of this cracked team and can’t wait to see how much further we can push this frontier together.
How can we make a better TerminalBench agent? Today, we are announcing the OpenThoughts-Agent project. OpenThoughts-Agent v1 is the first TerminalBench agent trained on fully open curated SFT and RL environments. OpenThinker-Agent-v1 is the strongest model of its size on
7
9
82
Across three days at NeurIPS earlier this month, Laude Lounge became a space for open, working conversations about the future of open frontier AI. We just published a complete digital record of the Lounge, including full-length Laudecast interviews (featuring @JeffDean
1
5
27
+1. My biggest regret at Berkeley is spending way too much time trying to maintain a 4.0 GPA (and my fellow bears would understand how unnecessarily painful that is). Looking back I definitely should’ve instead spend these time hanging out with friends, exploring/developing
27
115
1K
Imagine we have models post-trained on how to post-train itself better🥲
We release PostTrainBench: a benchmark measuring how well AI agents like Claude Code can post-train base LLMs. We expect this to be an important indicator for AI R&D automation as it unfolds over the next few years. 🔗 https://t.co/dVSSHkpAE1 📂 https://t.co/vqZNrQw66z 1/n
1
1
8
Check out our release!
The final night of Laude Lounge at NeurIPS 2025 focused on stack-level progress in open frontier AI, featuring: Michael Ryan, @DSPyOSS
@etash_guha, @NeginRaoof_ , Ben Feuer, @ryanmart3n - OpenThoughts-Agent @LakshyAAAgrawal, GEPA @alexgshaw, Harbor @tyler_griggs_ , SkyRL
0
0
11
There are competing views on whether RL can genuinely improve base model's performance (e.g., pass@128). The answer is both yes and no, largely depending on the interplay between pre-training, mid-training, and RL. We trained a few hundreds of GPT-2 scale LMs on synthetic
28
239
1K
Had so much fun this past week at #NeurIPS2025. Incredible food and view and met many fantastic people! Gotta switch back to grind mode but already missing San Diego😢
0
1
17
@NeginRaoof_ And here is a picture from the OpenThoughts-Agent launch at the Laude lounge yesterday
0
2
12
How can we make a better TerminalBench agent? Today, we are announcing the OpenThoughts-Agent project. OpenThoughts-Agent v1 is the first TerminalBench agent trained on fully open curated SFT and RL environments. OpenThinker-Agent-v1 is the strongest model of its size on
17
74
278
@NeginRaoof_ Release blog: https://t.co/AIyYsZltZk We will be releasing our progress here as we go!
openthoughts.ai
Curating the best open agent datasets.
0
3
10
🧑🍳🧑🍳
0
0
16
I'll be at San Diego for #NeurIPS2025 from 12/2 to 12/7! Been working on some exciting research in post-training/reasoning/agents so I would love to chat about research (and summer internships!) in these areas. Also please let me know if there's any social events I totally
0
0
5
Happy Thanksgiving my e-friends on X I’ve learned so much from your posts this past year🥹
remember to thank people who made you better and shaped you, parents, teachers, friends parents or whoever you are literally sitting on these genuine facts that might make change someones view of their entire life, and remove so much regret
0
0
5
Super cool benchmark!
New eval! Code duels for LMs ⚔️ Current evals test LMs on *tasks*: "fix this bug," "write a test" But we code to achieve *goals*: maximize revenue, cut costs, win users Meet CodeClash: LMs compete via their codebases across multi-round tournaments to achieve high-level goals
1
0
3
And actually just found this super related paper from Danqi’s group:
arxiv.org
Adapting language models (LMs) to new tasks via post-training carries the risk of degrading existing capabilities -- a phenomenon classically known as catastrophic forgetting. In this paper,...
0
0
0
Best of both worlds hooray
Our latest post explores on-policy distillation, a training approach that unites the error-correcting relevance of RL with the reward density of SFT. When training it for math reasoning and as an internal chat assistant, we find that on-policy distillation can outperform other
1
0
1
bringing back this banger for anyone still struggling with research papers https://t.co/zJyffsGXGZ
35
195
2K