Horace He @cHHillee X Profile

Horace He

@cHHillee

Followers

41K

Following

7K

Media

423

Statuses

3K

@thinkymachines Formerly @PyTorch "My learning style is Horace twitter threads" - @typedfemale

https://t.co/5ZxT71jk94

chhillee

Joined February 2010

Don't wanna be here? Send us removal request.

Horace He

@cHHillee

1 year

For too long, users have lived under the software lottery tyranny of fused attention implementations. No longer. Introducing FlexAttention, a new PyTorch API allowing for many attention variants to enjoy fused kernels in a few lines of PyTorch. https://t.co/IXeUS6AkrY 1/10

25

272

2K

Horace He

@cHHillee

3 days

Thanks to everyone who helped me with the figures and design(@alhyunsoo), helped me with experiments (@jacobmenick), and also to cut down my exclamation points by a factor of 3. :)

5

0

103

bolt.new

@boltdotnew

4 days

Claude Code & OpenAI Codex are coming to Bolt. Build enterprise-grade products visually right in your browser. No setup. No CLI tools. No 💔 error loops. Which agent are you most excited for?

125

119

1K

Horace He

@cHHillee

3 days

Apologies that I haven't written anything since joining Thinking Machines but I hope this blog post on a topic very near and dear to my heart (reproducible floating point numerics in LLM inference) will make up for it!

Thinking Machines

@thinkymachines

3 days

Today Thinking Machines Lab is launching our research blog, Connectionism. Our first blog post is “Defeating Nondeterminism in LLM Inference” We believe that science is better when shared. Connectionism will cover topics as varied as our research is: from kernel numerics to

69

209

3K

Horace He

@cHHillee

19 days

Suno 4.5 is quite impressive. Previously AI music was only ever interesting for the novelty. Now, I wouldn't blink if I heard one of these songs on a playlist. First generation I tried: Prompt: "Pop song about optimizing CUDA kernels for LLM training" https://t.co/p2ehQlpacr

8

212

Horace He

@cHHillee

23 days

https://t.co/WsFghCe7jJ

6

20

309

StoryTr🅰️ding⚡️📈

@StoryTrading

4 days

$SNES: The Tiny Company Tackling a Billion-Dollar Problem Introduction @SenesTech ($SNES) is tackling one of the world’s most persistent problems: rodent infestations. From major cities to farms across rural America, rats cause billions in annual losses by damaging crops,

8

4

30

Horace He

@cHHillee

28 days

When it comes to hardware that's meant for training or inference, most think about in hardware specs like memory bandwidth even though dev velocity is often a more important factor. One implication is that RL training and prod. inference are meaningfully different workloads.

12

8

257

Horace He

@cHHillee

1 month

This is super cool! With FlexAttention, you can now build a super minimal "throughput-oriented" inference system without needing custom kernels! One super cool part about using FlexAttention for this is that PagedAttention just ended up being a special case of the abstraction!

Jonathan Chang

@ChangJonathanC

1 month

while we wait for gpt-5 to drop. Here is a flex attention tutorial for building a < 1000 LoC vllm from scratch https://t.co/PVyauMezM3

1

13

234

Horace He

@cHHillee

1 month

Not a great look that after presenting GPT5's reduced hallucinations, their first example repeats a common error of how plane wings generate lift ("equal transit theory").

28

50

765

Horace He

@cHHillee

1 month

You're no match for OpenAI's marketing team.

typedfemale

@typedfemale

1 month

i should work in marketing

47

67

2K

Zelys - DFK Helper

@DFK_Helper

10 months

Don't settle: do it all.

1

4

42

Horace He

@cHHillee

2 months

Other than OpenAI, how many other AI efforts do you think will have gotten a gold medal at the IMO? Several other AI labs are vagueposting about their IMO results, but seem to abiding by IMO's request for a week's delay.

8

1

29

Horace He

@cHHillee

2 months

It's been an exciting 3 months at Thinky and so much has happened already! Imo we're building some of the best research infra around. Research infra is about jointly optimizing researcher *and* GPU efficiency, and it's been a joy to work on this with the other great folk here!

Mira Murati

@miramurati

2 months

Thinking Machines Lab exists to empower humanity through advancing collaborative general intelligence. We're building multimodal AI that works with how you naturally interact with the world - through conversation, through sight, through the messy way we collaborate. We're

12

448

Horace He

@cHHillee

4 months

I'll be at MLSys today! DM me if you want to chat about Pytorch, ML systems, or life at Thinking Machines!

6

0

96

Horace He

@cHHillee

4 months

The fundamental question here (computing MFU) is a very reasonable question to ask in an interview (and if I'd recommend learning it if you don't know how). However, the real interview question I would like to ask is this: "I see 3 assumptions in this question that range from

wh

@nrehiew_

4 months

Saw this on Reddit with half the comments shitting on it

13

6

286

Horace He

@cHHillee

4 months

When this word started popping up I initially smugly thought that people were misspelling "syncophant" only to realize that I'd entangled "sycophant" with "syncopation" in my head.

Danielle Fong 🔆

@DanielleFong

5 months

people using sycophant like they knew what it was

4

2

35

Horace He

@cHHillee

5 months

This is pretty neat. They insert into torch.compile and insert some profile-guided optimizations as well as a bunch of other specific optimizations like offloading. Since torch.compile is all in Python all their compiler passes are fairly accessible too! https://t.co/gxpcGQlILf

github.com

This PR introduces DeepCompile, a new feature that efficiently integrates compiler optimizations with other DeepSpeed features. DeepCompile utilizes torch's dynamo to capture the computatio...

DeepSpeed

@DeepSpeedAI

5 months

Introducing 🚀DeepCompile🚀: compiler-based distributed training optimizations. - Automatic parallelization & profile-guided optimizations - Enable ZeRO1, ZeRO3, Offloading, etc. via compiler passes - 1.2X-7X speedups over manual ZeRO1/ZeRO3/Offloading https://t.co/1DzW7buCO6

1

29

227

Horace He

@cHHillee

6 months

I'll be here and talking about ML systems! There'll be some of the best GPU folk I know here, so come and learn more together about Blackwell GPUs!

SemiAnalysis

@SemiAnalysis_

6 months

SemiAnalysis is hosting an Nvidia Blackwell GPU Hackathon on Sunday March 16th. It is the ultimate playground for Blackwell PTX tech enthusiasts, offering hands-on exploration of Blackwell & PTX infrastructure while collaborating on open-source projects.

7

16

227

Horace He

@cHHillee

6 months

If you're interested in working together at @thinkymachines, please DM me on Twitter. (also free to DM me if you want to work on PyTorch). I've been grateful to work at PyTorch, and I hope Thinking Machines will be just as fulfilling. 6/6

11

4

332

Horace He

@cHHillee

6 months

However, @thinkymachines ended up being an extremely compelling opportunity. The opportunity to be part of an extremely strong (and nice!) founding team, being able to to continue to contribute to open systems, and an approach to "making AI go good" that resonated with me. 5/6

3

4

168

Horace He

@cHHillee

6 months

The actual day-to-day on PyTorch has also been amazing - working on a project that undergirds the industry and values OSS impact provides a shelter from big-tech politics and amazing opportunities for career. https://t.co/SDeBA8U0BQ 4/6

Mike Schroepfer

@schrep

1 year

True Story! One of the many reasons I love open source is it doesn't give a damn about the org chart or "managing up." If people outside of FB/Meta didn't use or like our OSS then something was wrong with it. PyTorch succeeded because of the hyper focus on developer

1

2

115