Simon Guo
@simonguozirui
Followers
4K
Following
7K
Media
91
Statuses
2K
CS PhD student @Stanford | 🎓 @Berkeley_EECS | prev pre-training @cohere & built things at @ @anyscalecompute @nvidia
Palo Alto, CA
Joined September 2014
Wrote a 1-year retrospective with @a1zhang on KernelBench and the journey toward automated GPU/CUDA kernel generations! Since my labmates (@anneouyang, @simran_s_arora, @_williamhu) and I first started working towards this vision around last year’s @GPU_mode hackathon, we have
10
62
287
Wrote a blog post on why collective communication feels awkward for newer LLM workloads (disaggregated inference, RL weight update, MoE), why people don’t just use raw RDMA, how we approached it, and some behind-the-scenes stories.
le.qun.ch
Last week, our team summarized some recent progress we made on point-to-point communication for LLM systems and posted a paper on arXiv. We also open-sourced the code on GitHub. We built an RDMA co...
2
15
116
Talk at Ray Summit on "Building Cursor Composer." Overview of the work from our research team. https://t.co/9a5yeC3IT8
7
41
310
Today, we’re announcing the next chapter of Terminal-Bench with two releases: 1. Harbor, a new package for running sandboxed agent rollouts at scale 2. Terminal-Bench 2.0, a harder version of Terminal-Bench with increased verification
21
67
320
New weekend blogpost. Some light PTX exploration, and a simple Top-K kernel.
9
49
581
Science is best shared! Tell us about what you’ve built or discovered with Tinker, so we can tell the world about it on our blog. More details at
thinkingmachines.ai
Announcing Tinker Community Projects
30
40
341
When we train LLMs with RL, we might many criteria we want them to satisfy -- tests for code, constraints for free-form text, factuality... Verifying can be hard! We explored how an adversarial critic can automatically and efficiently evaluate generators for efficient training.
Can we run RL to train LLMs on hard-to-verify or open-ended tasks? Even when tasks are verifiable, it is often impossible to check every design detail or catch all mistakes.. We can go prompt-tune LLM judges, but is that really the answer? Our new paper introduces RLAC: a
8
26
281
Scaling Agent Learning via Experience Synthesis 📝: https://t.co/3WXayMsHrD Scaling training environments for RL by simulating them with reasoning LLMs! Environment models + Replay-buffer + New tasks = cheap RL for any environments! - Strong improvements over non-RL-ready
17
100
525
Meet Slingshots // One. This inaugural batch includes leading-edge researchers advancing the science and practice of AI - with benchmarks, frameworks, and agents that ship real impact into the world. We're honored to support research from: @alexgshaw @Mike_A_Merrill
2
17
61
🚀 Hello, Kimi K2 Thinking! The Open-Source Thinking Agent Model is here. 🔹 SOTA on HLE (44.9%) and BrowseComp (60.2%) 🔹 Executes up to 200 – 300 sequential tool calls without human interference 🔹 Excels in reasoning, agentic search, and coding 🔹 256K context window Built
574
2K
10K
Leaving Meta and PyTorch I'm stepping down from PyTorch and leaving Meta on November 17th. tl;dr: Didn't want to be doing PyTorch forever, seemed like the perfect time to transition right after I got back from a long leave and the project built itself around me. Eleven years
499
567
11K
Don't sleep on PipelineRL -- this is one of the biggest jumps in compute efficiency of RL setups that we found in the ScaleRL paper (also validated by Magistral & others before)! What's the problem PipelineRL solves? In RL for LLMs, we need to send weight updates from trainer to
In-flight weight updates have gone from a “weird trick” to a must to train LLMs with RL in the last few weeks. If you want to understand the on-policy and throughput benefits here’s the CoLM talk @DBahdanau and I gave:
11
61
477
Aight let's talk about frameworks, libraries, RL, and why I probably don't like your favorite RL codebase. Yes, including that one. The unusual thing about RL is that the algorithm is the easy part. GRPO is a single-line equation on some logprobs. If you have the data, computing
13
16
304
Being part of the pace of robotics right now feels unreal — breakthroughs everywhere, and we get to build some of them. Thanks to everyone who’s visited and shared in the excitement. The future’s racing toward us
37
27
390
KernelFalcon achieves 100% correctness across all 250 KernelBench L1–L3 tasks through a deep agent architecture that structures the problem instead of prompting harder. The system combines hierarchical task decomposition, deterministic orchestration, grounded execution, and
7
19
139
Over the course of a month, 264 poems were seeded, grown, and decayed, and printed. Each poem taken was removed from the site. You guys gave new homes to over 200 poems 🌱
1
2
17
New eval! Code duels for LMs ⚔️ Current evals test LMs on *tasks*: "fix this bug," "write a test" But we code to achieve *goals*: maximize revenue, cut costs, win users Meet CodeClash: LMs compete via their codebases across multi-round tournaments to achieve high-level goals
26
90
365
Looking back, it’s wild to see how much has changed. The work, the thinking, the pace. Grateful to work alongside you @chichengcc
Can we collect robot data without any robots? Introducing Universal Manipulation Interface (UMI) An open-source $400 system from @Stanford designed to democratize robot data collection 0 teleop -> autonomously wash dishes (precise), toss (dynamic), and fold clothes (bimanual)
20
31
294
as we code more with AI, we now spend a larger percentage of time reading code instead of writing. Codemaps is an interface for keeping a lot of code in your head while trying to understand complex systems across the codebase
Introducing Codemaps in @windsurf! powered by SWE-1.5 and Sonnet 4.5 “Your code is your understanding of the problem you’re exploring. So it’s only when you have your code in your head that you really understand the problem.” — @paulg
6
6
101
@ethanboneh started doing research a few weeks ago and has been trying all kinds of cool ideas — like playing the new @pytorch Helion DSL and leveraging OpenEvolve to speed up autotuning! If anyone wants to try RL on this task, he also designed an RL environment on
A week ago I went to my first @gpu_mode hackathon, and, together with @manojrajarao, @Ameen_ml and Emily Shen, placed fourth with HelionEvolve, an OpenEvolve-based autotuner for (Helion) GPU kernels.
0
4
41