Horace He Profile
Horace He

@cHHillee

Followers
41K
Following
7K
Media
423
Statuses
3K

@thinkymachines Formerly @PyTorch "My learning style is Horace twitter threads" - @typedfemale

chhillee
Joined February 2010
Don't wanna be here? Send us removal request.
@cHHillee
Horace He
1 year
For too long, users have lived under the software lottery tyranny of fused attention implementations. No longer. Introducing FlexAttention, a new PyTorch API allowing for many attention variants to enjoy fused kernels in a few lines of PyTorch. https://t.co/IXeUS6AkrY 1/10
Tweet media one
25
272
2K
@cHHillee
Horace He
3 days
Thanks to everyone who helped me with the figures and design(@alhyunsoo), helped me with experiments (@jacobmenick), and also to cut down my exclamation points by a factor of 3. :)
5
0
103
@boltdotnew
bolt.new
4 days
Claude Code & OpenAI Codex are coming to Bolt. Build enterprise-grade products visually right in your browser.  No setup. No CLI tools. No 💔 error loops. Which agent are you most excited for?
125
119
1K
@cHHillee
Horace He
3 days
Apologies that I haven't written anything since joining Thinking Machines but I hope this blog post on a topic very near and dear to my heart (reproducible floating point numerics in LLM inference) will make up for it!
@thinkymachines
Thinking Machines
3 days
Today Thinking Machines Lab is launching our research blog, Connectionism. Our first blog post is “Defeating Nondeterminism in LLM Inference” We believe that science is better when shared. Connectionism will cover topics as varied as our research is: from kernel numerics to
Tweet media one
69
209
3K
@cHHillee
Horace He
19 days
Suno 4.5 is quite impressive. Previously AI music was only ever interesting for the novelty. Now, I wouldn't blink if I heard one of these songs on a playlist. First generation I tried: Prompt: "Pop song about optimizing CUDA kernels for LLM training" https://t.co/p2ehQlpacr
8
8
212
@cHHillee
Horace He
23 days
Tweet media one
6
20
309
@StoryTrading
StoryTr🅰️ding⚡️📈
4 days
$SNES: The Tiny Company Tackling a Billion-Dollar Problem Introduction @SenesTech ($SNES) is tackling one of the world’s most persistent problems: rodent infestations. From major cities to farms across rural America, rats cause billions in annual losses by damaging crops,
Tweet media one
8
4
30
@cHHillee
Horace He
28 days
When it comes to hardware that's meant for training or inference, most think about in hardware specs like memory bandwidth even though dev velocity is often a more important factor. One implication is that RL training and prod. inference are meaningfully different workloads.
12
8
257
@cHHillee
Horace He
1 month
This is super cool! With FlexAttention, you can now build a super minimal "throughput-oriented" inference system without needing custom kernels! One super cool part about using FlexAttention for this is that PagedAttention just ended up being a special case of the abstraction!
@ChangJonathanC
Jonathan Chang
1 month
while we wait for gpt-5 to drop. Here is a flex attention tutorial for building a < 1000 LoC vllm from scratch https://t.co/PVyauMezM3
1
13
234
@cHHillee
Horace He
1 month
Not a great look that after presenting GPT5's reduced hallucinations, their first example repeats a common error of how plane wings generate lift ("equal transit theory").
Tweet media one
Tweet media two
28
50
765
@cHHillee
Horace He
1 month
You're no match for OpenAI's marketing team.
Tweet media one
@typedfemale
typedfemale
1 month
i should work in marketing
Tweet media one
47
67
2K
@DFK_Helper
Zelys - DFK Helper
10 months
Don't settle: do it all.
Tweet media one
1
4
42
@cHHillee
Horace He
2 months
Other than OpenAI, how many other AI efforts do you think will have gotten a gold medal at the IMO? Several other AI labs are vagueposting about their IMO results, but seem to abiding by IMO's request for a week's delay.
8
1
29
@cHHillee
Horace He
2 months
It's been an exciting 3 months at Thinky and so much has happened already! Imo we're building some of the best research infra around. Research infra is about jointly optimizing researcher *and* GPU efficiency, and it's been a joy to work on this with the other great folk here!
@miramurati
Mira Murati
2 months
Thinking Machines Lab exists to empower humanity through advancing collaborative general intelligence. We're building multimodal AI that works with how you naturally interact with the world - through conversation, through sight, through the messy way we collaborate. We're
12
12
448
@cHHillee
Horace He
4 months
I'll be at MLSys today! DM me if you want to chat about Pytorch, ML systems, or life at Thinking Machines!
6
0
96
@cHHillee
Horace He
4 months
The fundamental question here (computing MFU) is a very reasonable question to ask in an interview (and if I'd recommend learning it if you don't know how). However, the real interview question I would like to ask is this: "I see 3 assumptions in this question that range from
@nrehiew_
wh
4 months
Saw this on Reddit with half the comments shitting on it
Tweet media one
13
6
286
@cHHillee
Horace He
4 months
When this word started popping up I initially smugly thought that people were misspelling "syncophant" only to realize that I'd entangled "sycophant" with "syncopation" in my head.
@DanielleFong
Danielle Fong 🔆
5 months
people using sycophant like they knew what it was
Tweet media one
4
2
35
@cHHillee
Horace He
5 months
This is pretty neat. They insert into torch.compile and insert some profile-guided optimizations as well as a bunch of other specific optimizations like offloading. Since torch.compile is all in Python all their compiler passes are fairly accessible too! https://t.co/gxpcGQlILf
Tweet card summary image
github.com
This PR introduces DeepCompile, a new feature that efficiently integrates compiler optimizations with other DeepSpeed features. DeepCompile utilizes torch's dynamo to capture the computatio...
@DeepSpeedAI
DeepSpeed
5 months
Introducing 🚀DeepCompile🚀: compiler-based distributed training optimizations. - Automatic parallelization & profile-guided optimizations - Enable ZeRO1, ZeRO3, Offloading, etc. via compiler passes - 1.2X-7X speedups over manual ZeRO1/ZeRO3/Offloading https://t.co/1DzW7buCO6
Tweet media one
1
29
227
@cHHillee
Horace He
6 months
I'll be here and talking about ML systems! There'll be some of the best GPU folk I know here, so come and learn more together about Blackwell GPUs!
@SemiAnalysis_
SemiAnalysis
6 months
SemiAnalysis is hosting an Nvidia Blackwell GPU Hackathon on Sunday March 16th. It is the ultimate playground for Blackwell PTX tech enthusiasts, offering hands-on exploration of Blackwell & PTX infrastructure while collaborating on open-source projects.
7
16
227
@cHHillee
Horace He
6 months
If you're interested in working together at @thinkymachines, please DM me on Twitter. (also free to DM me if you want to work on PyTorch). I've been grateful to work at PyTorch, and I hope Thinking Machines will be just as fulfilling. 6/6
Tweet media one
11
4
332
@cHHillee
Horace He
6 months
However, @thinkymachines ended up being an extremely compelling opportunity. The opportunity to be part of an extremely strong (and nice!) founding team, being able to to continue to contribute to open systems, and an approach to "making AI go good" that resonated with me. 5/6
Tweet media one
3
4
168
@cHHillee
Horace He
6 months
The actual day-to-day on PyTorch has also been amazing - working on a project that undergirds the industry and values OSS impact provides a shelter from big-tech politics and amazing opportunities for career. https://t.co/SDeBA8U0BQ 4/6
@schrep
Mike Schroepfer
1 year
True Story! One of the many reasons I love open source is it doesn't give a damn about the org chart or "managing up." If people outside of FB/Meta didn't use or like our OSS then something was wrong with it. PyTorch succeeded because of the hyper focus on developer
1
2
115