Richard Zhuang @RichardZ412 X Profile

Richard Zhuang

@RichardZ412

Followers

636

Following

1K

Media

6

Statuses

196

CS @Stanford ｜Prev. @UCBerkeley @bespokelabsai | LLM Post-Training, Agents, Collective Intelligence

https://t.co/Sl5ZGZ6X9B

California, USA

Joined September 2021

Don't wanna be here? Send us removal request.

Richard Zhuang

@RichardZ412

18 days

OpenThoughts-Agent is LIVE! Fully open SFT + RL stack, new small-model SOTA on Terminal-Bench…and this is just V1. Super honored to be part of this cracked team and can’t wait to see how much further we can push this frontier together.

Negin Raoof

@NeginRaoof_

18 days

How can we make a better TerminalBench agent? Today, we are announcing the OpenThoughts-Agent project. OpenThoughts-Agent v1 is the first TerminalBench agent trained on fully open curated SFT and RL environments. OpenThinker-Agent-v1 is the strongest model of its size on

7

9

82

Etash Guha

@etash_guha

4 days

OpenThoughts-Agent dataset is trending on HuggingFace!

1

4

27

Laude Institute

@LaudeInstitute

5 days

Across three days at NeurIPS earlier this month, Laude Lounge became a space for open, working conversations about the future of open frontier AI. We just published a complete digital record of the Lounge, including full-length Laudecast interviews (featuring @JeffDean

1

5

27

Richard Zhuang

@RichardZ412

6 days

+1. My biggest regret at Berkeley is spending way too much time trying to maintain a 4.0 GPA (and my fellow bears would understand how unnecessarily painful that is). Looking back I definitely should’ve instead spend these time hanging out with friends, exploring/developing

Jaynit

@jaynitx

7 days

Andrej Karpathy literally revealed why "perfect grades" are a waste of precious time:

27

115

1K

Richard Zhuang

@RichardZ412

6 days

Imagine we have models post-trained on how to post-train itself better🥲

Maksym Andriushchenko

@maksym_andr

7 days

We release PostTrainBench: a benchmark measuring how well AI agents like Claude Code can post-train base LLMs. We expect this to be an important indicator for AI R&D automation as it unfolds over the next few years. 🔗 https://t.co/dVSSHkpAE1 📂 https://t.co/vqZNrQw66z 1/n

1

8

Richard Zhuang

@RichardZ412

8 days

Check out our release!

Laude Institute

@LaudeInstitute

9 days

The final night of Laude Lounge at NeurIPS 2025 focused on stack-level progress in open frontier AI, featuring: Michael Ryan, @DSPyOSS @etash_guha, @NeginRaoof_ , Ben Feuer, @ryanmart3n - OpenThoughts-Agent @LakshyAAAgrawal, GEPA @alexgshaw, Harbor @tyler_griggs_ , SkyRL

0

11

Xiang Yue

@xiangyue96

15 days

There are competing views on whether RL can genuinely improve base model's performance (e.g., pass@128). The answer is both yes and no, largely depending on the interplay between pre-training, mid-training, and RL. We trained a few hundreds of GPT-2 scale LMs on synthetic

28

239

1K

Richard Zhuang

@RichardZ412

16 days

Had so much fun this past week at #NeurIPS2025. Incredible food and view and met many fantastic people! Gotta switch back to grind mode but already missing San Diego😢

0

1

17

Alex Dimakis

@AlexGDimakis

18 days

@NeginRaoof_ And here is a picture from the OpenThoughts-Agent launch at the Laude lounge yesterday

0

2

12

Negin Raoof

@NeginRaoof_

18 days

How can we make a better TerminalBench agent? Today, we are announcing the OpenThoughts-Agent project. OpenThoughts-Agent v1 is the first TerminalBench agent trained on fully open curated SFT and RL environments. OpenThinker-Agent-v1 is the strongest model of its size on

17

74

278

Ryan Marten

@ryanmart3n

18 days

@NeginRaoof_ Release blog: https://t.co/AIyYsZltZk We will be releasing our progress here as we go!

openthoughts.ai

Curating the best open agent datasets.

0

3

10

Richard Zhuang

@RichardZ412

19 days

🧑‍🍳🧑‍🍳

Negin Raoof

@NeginRaoof_

19 days

Prepping for the launch tonight 🤖

0

16

Richard Zhuang

@RichardZ412

25 days

I'll be at San Diego for #NeurIPS2025 from 12/2 to 12/7! Been working on some exciting research in post-training/reasoning/agents so I would love to chat about research (and summer internships!) in these areas. Also please let me know if there's any social events I totally

0

5

Richard Zhuang

@RichardZ412

27 days

Happy Thanksgiving my e-friends on X I’ve learned so much from your posts this past year🥹

gabriel

@gabriel1

28 days

remember to thank people who made you better and shaped you, parents, teachers, friends parents or whoever you are literally sitting on these genuine facts that might make change someones view of their entire life, and remove so much regret

0

5

Richard Zhuang

@RichardZ412

2 months

Agent Agent Agent

Alex Shaw

@alexgshaw

2 months

Today, we’re announcing the next chapter of Terminal-Bench with two releases: 1. Harbor, a new package for running sandboxed agent rollouts at scale 2. Terminal-Bench 2.0, a harder version of Terminal-Bench with increased verification

0

5

Richard Zhuang

@RichardZ412

2 months

Super cool benchmark!

John Yang

@jyangballin

2 months

New eval! Code duels for LMs ⚔️ Current evals test LMs on *tasks*: "fix this bug," "write a test" But we code to achieve *goals*: maximize revenue, cut costs, win users Meet CodeClash: LMs compete via their codebases across multi-round tournaments to achieve high-level goals

1

0

3

Richard Zhuang

@RichardZ412

2 months

Reading list ++

Lewis Tunstall

@_lewtun

2 months

We've just published the Smol Training Playbook: a distillation of hard earned knowledge to share exactly what it takes to train SOTA LLMs ⚡️ Featuring our protagonist SmolLM3, we cover: 🧭 Strategy on whether to train your own LLM and burn all your VC money 🪨 Pretraining,

0

1

Richard Zhuang

@RichardZ412

2 months

And actually just found this super related paper from Danqi’s group:

arxiv.org

Adapting language models (LMs) to new tasks via post-training carries the risk of degrading existing capabilities -- a phenomenon classically known as catastrophic forgetting. In this paper,...

0

Richard Zhuang

@RichardZ412

2 months

Best of both worlds hooray

Thinking Machines

@thinkymachines

2 months

Our latest post explores on-policy distillation, a training approach that unites the error-correcting relevance of RL with the reward density of SFT. When training it for math reasoning and as an internal chat assistant, we find that on-policy distillation can outperform other

1

0

1

alexine 🏴‍☠️

@alexinexxx

2 months

bringing back this banger for anyone still struggling with research papers https://t.co/zJyffsGXGZ

35

195

2K