simonguozirui Profile Banner
Simon Guo Profile
Simon Guo

@simonguozirui

Followers
4K
Following
7K
Media
91
Statuses
2K

CS PhD student @Stanford | 🎓 @Berkeley_EECS | prev pre-training @cohere & built things at @ @anyscalecompute @nvidia

Palo Alto, CA
Joined September 2014
Don't wanna be here? Send us removal request.
@simonguozirui
Simon Guo
18 days
Wrote a 1-year retrospective with @a1zhang on KernelBench and the journey toward automated GPU/CUDA kernel generations! Since my labmates (@anneouyang, @simran_s_arora, @_williamhu) and I first started working towards this vision around last year’s @GPU_mode hackathon, we have
10
62
287
@abcdabcd987
Lequn Chen
21 hours
Wrote a blog post on why collective communication feels awkward for newer LLM workloads (disaggregated inference, RL weight update, MoE), why people don’t just use raw RDMA, how we approached it, and some behind-the-scenes stories.
le.qun.ch
Last week, our team summarized some recent progress we made on point-to-point communication for LLM systems and posted a paper on arXiv. We also open-sourced the code on GitHub. We built an RDMA co...
2
15
116
@srush_nlp
Sasha Rush
2 days
Talk at Ray Summit on "Building Cursor Composer." Overview of the work from our research team. https://t.co/9a5yeC3IT8
7
41
310
@alexgshaw
Alex Shaw
4 days
Today, we’re announcing the next chapter of Terminal-Bench with two releases: 1. Harbor, a new package for running sandboxed agent rollouts at scale 2. Terminal-Bench 2.0, a harder version of Terminal-Bench with increased verification
21
67
320
@AlpinDale
Alpin
3 days
New weekend blogpost. Some light PTX exploration, and a simple Top-K kernel.
9
49
581
@thinkymachines
Thinking Machines
4 days
Science is best shared! Tell us about what you’ve built or discovered with Tinker, so we can tell the world about it on our blog. More details at
Tweet card summary image
thinkingmachines.ai
Announcing Tinker Community Projects
30
40
341
@svlevine
Sergey Levine
4 days
When we train LLMs with RL, we might many criteria we want them to satisfy -- tests for code, constraints for free-form text, factuality... Verifying can be hard! We explored how an adversarial critic can automatically and efficiently evaluate generators for efficient training.
@MerlinNoth79247
Mian Wu
4 days
Can we run RL to train LLMs on hard-to-verify or open-ended tasks? Even when tasks are verifiable, it is often impossible to check every design detail or catch all mistakes.. We can go prompt-tune LLM judges, but is that really the answer? Our new paper introduces RLAC: a
8
26
281
@jaseweston
Jason Weston
5 days
Scaling Agent Learning via Experience Synthesis 📝: https://t.co/3WXayMsHrD Scaling training environments for RL by simulating them with reasoning LLMs! Environment models + Replay-buffer + New tasks = cheap RL for any environments! - Strong improvements over non-RL-ready
17
100
525
@LaudeInstitute
Laude Institute
5 days
Meet Slingshots // One. This inaugural batch includes leading-edge researchers advancing the science and practice of AI - with benchmarks, frameworks, and agents that ship real impact into the world. We're honored to support research from: @alexgshaw @Mike_A_Merrill
2
17
61
@Kimi_Moonshot
Kimi.ai
5 days
🚀 Hello, Kimi K2 Thinking! The Open-Source Thinking Agent Model is here. 🔹 SOTA on HLE (44.9%) and BrowseComp (60.2%) 🔹 Executes up to 200 – 300 sequential tool calls without human interference 🔹 Excels in reasoning, agentic search, and coding 🔹 256K context window Built
574
2K
10K
@soumithchintala
Soumith Chintala
5 days
Leaving Meta and PyTorch I'm stepping down from PyTorch and leaving Meta on November 17th. tl;dr: Didn't want to be doing PyTorch forever, seemed like the perfect time to transition right after I got back from a long leave and the project built itself around me. Eleven years
499
567
11K
@agarwl_
Rishabh Agarwal
6 days
Don't sleep on PipelineRL -- this is one of the biggest jumps in compute efficiency of RL setups that we found in the ScaleRL paper (also validated by Magistral & others before)! What's the problem PipelineRL solves? In RL for LLMs, we need to send weight updates from trainer to
@alexpiche_
Alexandre L.-Piché
7 days
In-flight weight updates have gone from a “weird trick” to a must to train LLMs with RL in the last few weeks. If you want to understand the on-policy and throughput benefits here’s the CoLM talk @DBahdanau and I gave:
11
61
477
@redtachyon
Ariel
6 days
Aight let's talk about frameworks, libraries, RL, and why I probably don't like your favorite RL codebase. Yes, including that one. The unusual thing about RL is that the algorithm is the easy part. GRPO is a single-line equation on some logprobs. If you have the data, computing
13
16
304
@tonyzzhao
Tony Z. Zhao
6 days
Being part of the pace of robotics right now feels unreal — breakthroughs everywhere, and we get to build some of them. Thanks to everyone who’s visited and shared in the excitement. The future’s racing toward us
37
27
390
@PyTorch
PyTorch
6 days
KernelFalcon achieves 100% correctness across all 250 KernelBench L1–L3 tasks through a deep agent architecture that structures the problem instead of prompting harder. The system combines hierarchical task decomposition, deterministic orchestration, grounded execution, and
7
19
139
@a1zhang
Alex L Zhang
6 days
lots of fun announcements / news from @GPU_MODE this week! i sadly wasn't able to go, but they've written up a wonderful blogpost on the IRL hackathon in SF a couple weeks back + all the cool winner projects. go have a read on the website!
1
3
54
@upcycledwords
Alicia Guo
6 days
Over the course of a month, 264 poems were seeded, grown, and decayed, and printed. Each poem taken was removed from the site. You guys gave new homes to over 200 poems 🌱
1
2
17
@jyangballin
John Yang
6 days
New eval! Code duels for LMs ⚔️ Current evals test LMs on *tasks*: "fix this bug," "write a test" But we code to achieve *goals*: maximize revenue, cut costs, win users Meet CodeClash: LMs compete via their codebases across multi-round tournaments to achieve high-level goals
26
90
365
@tonyzzhao
Tony Z. Zhao
6 days
Looking back, it’s wild to see how much has changed. The work, the thinking, the pace. Grateful to work alongside you @chichengcc
@chichengcc
Cheng Chi
2 years
Can we collect robot data without any robots? Introducing Universal Manipulation Interface (UMI) An open-source $400 system from @Stanford designed to democratize robot data collection 0 teleop -> autonomously wash dishes (precise), toss (dynamic), and fold clothes (bimanual)
20
31
294
@silasalberti
Silas Alberti
7 days
as we code more with AI, we now spend a larger percentage of time reading code instead of writing. Codemaps is an interface for keeping a lot of code in your head while trying to understand complex systems across the codebase
@cognition
Cognition
7 days
Introducing Codemaps in @windsurf! powered by SWE-1.5 and Sonnet 4.5 “Your code is your understanding of the problem you’re exploring. So it’s only when you have your code in your head that you really understand the problem.” — @paulg
6
6
101
@simonguozirui
Simon Guo
7 days
@ethanboneh started doing research a few weeks ago and has been trying all kinds of cool ideas — like playing the new @pytorch Helion DSL and leveraging OpenEvolve to speed up autotuning! If anyone wants to try RL on this task, he also designed an RL environment on
@ethanboneh
Ethan Boneh
7 days
A week ago I went to my first @gpu_mode hackathon, and, together with @manojrajarao, @Ameen_ml and Emily Shen, placed fourth with HelionEvolve, an OpenEvolve-based autotuner for (Helion) GPU kernels.
0
4
41