
Horace He
@cHHillee
Followers
41K
Following
7K
Media
423
Statuses
3K
@thinkymachines Formerly @PyTorch "My learning style is Horace twitter threads" - @typedfemale
chhillee
Joined February 2010
For too long, users have lived under the software lottery tyranny of fused attention implementations. No longer. Introducing FlexAttention, a new PyTorch API allowing for many attention variants to enjoy fused kernels in a few lines of PyTorch. https://t.co/IXeUS6AkrY 1/10
25
272
2K
Thanks to everyone who helped me with the figures and design(@alhyunsoo), helped me with experiments (@jacobmenick), and also to cut down my exclamation points by a factor of 3. :)
5
0
103
Claude Code & OpenAI Codex are coming to Bolt. Build enterprise-grade products visually right in your browser. No setup. No CLI tools. No 💔 error loops. Which agent are you most excited for?
125
119
1K
Apologies that I haven't written anything since joining Thinking Machines but I hope this blog post on a topic very near and dear to my heart (reproducible floating point numerics in LLM inference) will make up for it!
Today Thinking Machines Lab is launching our research blog, Connectionism. Our first blog post is “Defeating Nondeterminism in LLM Inference” We believe that science is better when shared. Connectionism will cover topics as varied as our research is: from kernel numerics to
69
209
3K
Suno 4.5 is quite impressive. Previously AI music was only ever interesting for the novelty. Now, I wouldn't blink if I heard one of these songs on a playlist. First generation I tried: Prompt: "Pop song about optimizing CUDA kernels for LLM training" https://t.co/p2ehQlpacr
8
8
212
$SNES: The Tiny Company Tackling a Billion-Dollar Problem Introduction @SenesTech ($SNES) is tackling one of the world’s most persistent problems: rodent infestations. From major cities to farms across rural America, rats cause billions in annual losses by damaging crops,
8
4
30
When it comes to hardware that's meant for training or inference, most think about in hardware specs like memory bandwidth even though dev velocity is often a more important factor. One implication is that RL training and prod. inference are meaningfully different workloads.
12
8
257
This is super cool! With FlexAttention, you can now build a super minimal "throughput-oriented" inference system without needing custom kernels! One super cool part about using FlexAttention for this is that PagedAttention just ended up being a special case of the abstraction!
while we wait for gpt-5 to drop. Here is a flex attention tutorial for building a < 1000 LoC vllm from scratch https://t.co/PVyauMezM3
1
13
234
Not a great look that after presenting GPT5's reduced hallucinations, their first example repeats a common error of how plane wings generate lift ("equal transit theory").
28
50
765
You're no match for OpenAI's marketing team.
47
67
2K
Other than OpenAI, how many other AI efforts do you think will have gotten a gold medal at the IMO? Several other AI labs are vagueposting about their IMO results, but seem to abiding by IMO's request for a week's delay.
8
1
29
It's been an exciting 3 months at Thinky and so much has happened already! Imo we're building some of the best research infra around. Research infra is about jointly optimizing researcher *and* GPU efficiency, and it's been a joy to work on this with the other great folk here!
Thinking Machines Lab exists to empower humanity through advancing collaborative general intelligence. We're building multimodal AI that works with how you naturally interact with the world - through conversation, through sight, through the messy way we collaborate. We're
12
12
448
I'll be at MLSys today! DM me if you want to chat about Pytorch, ML systems, or life at Thinking Machines!
6
0
96
The fundamental question here (computing MFU) is a very reasonable question to ask in an interview (and if I'd recommend learning it if you don't know how). However, the real interview question I would like to ask is this: "I see 3 assumptions in this question that range from
13
6
286
When this word started popping up I initially smugly thought that people were misspelling "syncophant" only to realize that I'd entangled "sycophant" with "syncopation" in my head.
4
2
35
This is pretty neat. They insert into torch.compile and insert some profile-guided optimizations as well as a bunch of other specific optimizations like offloading. Since torch.compile is all in Python all their compiler passes are fairly accessible too! https://t.co/gxpcGQlILf
github.com
This PR introduces DeepCompile, a new feature that efficiently integrates compiler optimizations with other DeepSpeed features. DeepCompile utilizes torch's dynamo to capture the computatio...
Introducing 🚀DeepCompile🚀: compiler-based distributed training optimizations. - Automatic parallelization & profile-guided optimizations - Enable ZeRO1, ZeRO3, Offloading, etc. via compiler passes - 1.2X-7X speedups over manual ZeRO1/ZeRO3/Offloading https://t.co/1DzW7buCO6
1
29
227
I'll be here and talking about ML systems! There'll be some of the best GPU folk I know here, so come and learn more together about Blackwell GPUs!
SemiAnalysis is hosting an Nvidia Blackwell GPU Hackathon on Sunday March 16th. It is the ultimate playground for Blackwell PTX tech enthusiasts, offering hands-on exploration of Blackwell & PTX infrastructure while collaborating on open-source projects.
7
16
227
If you're interested in working together at @thinkymachines, please DM me on Twitter. (also free to DM me if you want to work on PyTorch). I've been grateful to work at PyTorch, and I hope Thinking Machines will be just as fulfilling. 6/6
11
4
332
However, @thinkymachines ended up being an extremely compelling opportunity. The opportunity to be part of an extremely strong (and nice!) founding team, being able to to continue to contribute to open systems, and an approach to "making AI go good" that resonated with me. 5/6
3
4
168
The actual day-to-day on PyTorch has also been amazing - working on a project that undergirds the industry and values OSS impact provides a shelter from big-tech politics and amazing opportunities for career. https://t.co/SDeBA8U0BQ 4/6
True Story! One of the many reasons I love open source is it doesn't give a damn about the org chart or "managing up." If people outside of FB/Meta didn't use or like our OSS then something was wrong with it. PyTorch succeeded because of the hyper focus on developer
1
2
115