Simon Guo @simonguozirui X Profile

Simon Guo

@simonguozirui

Followers

4K

Following

7K

Media

91

Statuses

2K

CS PhD student @Stanford | 🎓 @Berkeley_EECS | prev pre-training @cohere & built things at @ @anyscalecompute @nvidia

https://t.co/lPxtZKMJI8

Palo Alto, CA

Joined September 2014

Don't wanna be here? Send us removal request.

Simon Guo

@simonguozirui

18 days

Wrote a 1-year retrospective with @a1zhang on KernelBench and the journey toward automated GPU/CUDA kernel generations! Since my labmates (@anneouyang, @simran_s_arora, @_williamhu) and I first started working towards this vision around last year’s @GPU_mode hackathon, we have

10

62

287

Lequn Chen

@abcdabcd987

21 hours

Wrote a blog post on why collective communication feels awkward for newer LLM workloads (disaggregated inference, RL weight update, MoE), why people don’t just use raw RDMA, how we approached it, and some behind-the-scenes stories.

le.qun.ch

Last week, our team summarized some recent progress we made on point-to-point communication for LLM systems and posted a paper on arXiv. We also open-sourced the code on GitHub. We built an RDMA co...

2

15

116

Sasha Rush

@srush_nlp

2 days

Talk at Ray Summit on "Building Cursor Composer." Overview of the work from our research team. https://t.co/9a5yeC3IT8

7

41

310

Alex Shaw

@alexgshaw

4 days

Today, we’re announcing the next chapter of Terminal-Bench with two releases: 1. Harbor, a new package for running sandboxed agent rollouts at scale 2. Terminal-Bench 2.0, a harder version of Terminal-Bench with increased verification

21

67

320

Alpin

@AlpinDale

3 days

New weekend blogpost. Some light PTX exploration, and a simple Top-K kernel.

9

49

581

Thinking Machines

@thinkymachines

4 days

Science is best shared! Tell us about what you’ve built or discovered with Tinker, so we can tell the world about it on our blog. More details at

thinkingmachines.ai

Announcing Tinker Community Projects

30

40

341

Sergey Levine

@svlevine

4 days

When we train LLMs with RL, we might many criteria we want them to satisfy -- tests for code, constraints for free-form text, factuality... Verifying can be hard! We explored how an adversarial critic can automatically and efficiently evaluate generators for efficient training.

Mian Wu

@MerlinNoth79247

4 days

Can we run RL to train LLMs on hard-to-verify or open-ended tasks? Even when tasks are verifiable, it is often impossible to check every design detail or catch all mistakes.. We can go prompt-tune LLM judges, but is that really the answer? Our new paper introduces RLAC: a

8

26

281

Jason Weston

@jaseweston

5 days

Scaling Agent Learning via Experience Synthesis 📝: https://t.co/3WXayMsHrD Scaling training environments for RL by simulating them with reasoning LLMs! Environment models + Replay-buffer + New tasks = cheap RL for any environments! - Strong improvements over non-RL-ready

17

100

525

Laude Institute

@LaudeInstitute

5 days

Meet Slingshots // One. This inaugural batch includes leading-edge researchers advancing the science and practice of AI - with benchmarks, frameworks, and agents that ship real impact into the world. We're honored to support research from: @alexgshaw @Mike_A_Merrill

2

17

61

Kimi.ai

@Kimi_Moonshot

5 days

🚀 Hello, Kimi K2 Thinking! The Open-Source Thinking Agent Model is here. 🔹 SOTA on HLE (44.9%) and BrowseComp (60.2%) 🔹 Executes up to 200 – 300 sequential tool calls without human interference 🔹 Excels in reasoning, agentic search, and coding 🔹 256K context window Built

574

2K

10K

Soumith Chintala

@soumithchintala

5 days

Leaving Meta and PyTorch I'm stepping down from PyTorch and leaving Meta on November 17th. tl;dr: Didn't want to be doing PyTorch forever, seemed like the perfect time to transition right after I got back from a long leave and the project built itself around me. Eleven years

499

567

11K

Rishabh Agarwal

@agarwl_

6 days

Don't sleep on PipelineRL -- this is one of the biggest jumps in compute efficiency of RL setups that we found in the ScaleRL paper (also validated by Magistral & others before)! What's the problem PipelineRL solves? In RL for LLMs, we need to send weight updates from trainer to

Alexandre L.-Piché

@alexpiche_

7 days

In-flight weight updates have gone from a “weird trick” to a must to train LLMs with RL in the last few weeks. If you want to understand the on-policy and throughput benefits here’s the CoLM talk @DBahdanau and I gave:

11

61

477

Ariel

@redtachyon

6 days

Aight let's talk about frameworks, libraries, RL, and why I probably don't like your favorite RL codebase. Yes, including that one. The unusual thing about RL is that the algorithm is the easy part. GRPO is a single-line equation on some logprobs. If you have the data, computing

13

16

304

Tony Z. Zhao

@tonyzzhao

6 days

Being part of the pace of robotics right now feels unreal — breakthroughs everywhere, and we get to build some of them. Thanks to everyone who’s visited and shared in the excitement. The future’s racing toward us

37

27

390

PyTorch

@PyTorch

6 days

KernelFalcon achieves 100% correctness across all 250 KernelBench L1–L3 tasks through a deep agent architecture that structures the problem instead of prompting harder. The system combines hierarchical task decomposition, deterministic orchestration, grounded execution, and

7

19

139

Alex L Zhang

@a1zhang

6 days

lots of fun announcements / news from @GPU_MODE this week! i sadly wasn't able to go, but they've written up a wonderful blogpost on the IRL hackathon in SF a couple weeks back + all the cool winner projects. go have a read on the website!

1

3

54

Alicia Guo

@upcycledwords

6 days

Over the course of a month, 264 poems were seeded, grown, and decayed, and printed. Each poem taken was removed from the site. You guys gave new homes to over 200 poems 🌱

1

2

17

John Yang

@jyangballin

6 days

New eval! Code duels for LMs ⚔️ Current evals test LMs on *tasks*: "fix this bug," "write a test" But we code to achieve *goals*: maximize revenue, cut costs, win users Meet CodeClash: LMs compete via their codebases across multi-round tournaments to achieve high-level goals

26

90

365

Tony Z. Zhao

@tonyzzhao

6 days

Looking back, it’s wild to see how much has changed. The work, the thinking, the pace. Grateful to work alongside you @chichengcc

Cheng Chi

@chichengcc

2 years

Can we collect robot data without any robots? Introducing Universal Manipulation Interface (UMI) An open-source $400 system from @Stanford designed to democratize robot data collection 0 teleop -> autonomously wash dishes (precise), toss (dynamic), and fold clothes (bimanual)

20

31

294

Silas Alberti

@silasalberti

7 days

as we code more with AI, we now spend a larger percentage of time reading code instead of writing. Codemaps is an interface for keeping a lot of code in your head while trying to understand complex systems across the codebase

Cognition

@cognition

7 days

Introducing Codemaps in @windsurf! powered by SWE-1.5 and Sonnet 4.5 “Your code is your understanding of the problem you’re exploring. So it’s only when you have your code in your head that you really understand the problem.” — @paulg

6

101

Simon Guo

@simonguozirui

7 days

@ethanboneh started doing research a few weeks ago and has been trying all kinds of cool ideas — like playing the new @pytorch Helion DSL and leveraging OpenEvolve to speed up autotuning! If anyone wants to try RL on this task, he also designed an RL environment on

Ethan Boneh

@ethanboneh

7 days

A week ago I went to my first @gpu_mode hackathon, and, together with @manojrajarao, @Ameen_ml and Emily Shen, placed fourth with HelionEvolve, an OpenEvolve-based autotuner for (Helion) GPU kernels.

0

4

41