
Dan Fu
@realDanFu
Followers
6K
Following
1K
Media
190
Statuses
786
Incoming assistant professor at UCSD CSE in MLSys. Currently recruiting students! Also running the kernels team @togethercompute.
Joined September 2019
I really enjoyed this talk from @bariskasikci at @ESFoMo - some really fine-grained analysis of compute patterns of LLM serving in the throughput-bound regime, and how to schedule operations to push the boundaries (a linear program)!. Great work!
0
1
9
ES-FoMo is back tomorrow! Come join is in East Exhibition Hall A bright and early at 8:30AM for a great slate of invited talks, orals, spotlight lightning talks, and 150 posters!.
Looking forward to seeing everyone for ES-FoMo part three tomorrow! We'll be in East Exhibition Hall A (the big one), and we've got an exciting schedule of invited talks, orals, and posters planned for you tomorrow. Let's meet some of our great speakers! 1/
0
2
14
RT @ESFoMo: Looking forward to seeing everyone for ES-FoMo part three tomorrow! We'll be in East Exhibition Hall A (the big one), and we've….
0
20
0
And @keshigeyan is going to be presenting about Grafting - a great collaboration with @MichaelPoli6 on how to distill pretrained diffusion models into new architectures (Transformers -> Hyenas). 4/.
1/ Model architectures have been mostly treated as fixed post-training. 🌱 Introducing Grafting: A new way to edit pretrained diffusion transformers, allowing us to customize architectural designs on a small compute budget. 🌎 Co-led with @MichaelPoli6
0
3
7
Two papers at the workshop I’m a bit fond of…. @austinsilveria and @SohamGovande are going to be presenting Chipmunk - come chat with them about how they made video diffusion 3.7x faster! (With custom column-sparse attention kernels). 3/.
Training-free acceleration of Diffusion Transformers with dynamic sparsity and cross-step attention/MLP deltas--collaboration with @SohamGovande and @realDanFu!. ⚡️ 3.7x faster video and 1.6x faster image generation while preserving quality!. 🧵 Open-source code & CUDA kernels!
1
2
9
On Saturday we’re hosting the ES-FoMo workshop, with @tri_dao, @dan_biderman, @simran_s_arora, @m_ryabinin and others - we’ve got a great slate of papers and invited talks, come join us! (More on the great slate of speakers soon). 2/.
ES-FoMo is back for round three at #ICML2025! Join us in Vancouver on Saturday July 19 for a day dedicated to Efficient Systems for Foundation Models: from 💬reasoning models to🖼️scalable multimodality, 🧱efficient architectures, and more!. Submissions due May 26! More below 👇.
1
3
13
I’m off to #ICML2025 in Vancouver! . (After an unusually eventful first flight - our plane had a wing problem, so we had to take an emergency landing back to SFO & switch planes). Reach out if you’d like to chat about (mega)kernels, @togethercompute, or anything MLSys!. 1/
2
0
22
Fastest Deepseek! Super proud of the amazing inference team at Together for pulling this off!.
Together AI Sets a New Bar: Fastest Inference for DeepSeek-R1-0528. We’ve upgraded the Together Inference Engine to run on @NVIDIA Blackwell GPUs—and the results speak for themselves:.📈 Highest known serverless throughput: 334 tokens/sec.🏃Fastest time to first answer token:
0
0
7
RT @togethercompute: Together AI Sets a New Bar: Fastest Inference for DeepSeek-R1-0528. We’ve upgraded the Together Inference Engine to ru….
0
14
0
Synthetics like associative recall, MQAR are a great guide to building models. Excited to see this work from @nick11roberts to create new LMs!.
🎉 Excited to share that our paper "Pretrained Hybrids with MAD Skills" was accepted to @COLM_conf 2025!.We introduce Manticore - a framework for automatically creating hybrid LMs from pretrained models without training from scratch. 🧵[1/n]
0
1
12
RT @nick11roberts: 🎉 Excited to share that our paper "Pretrained Hybrids with MAD Skills" was accepted to @COLM_conf 2025!.We introduce Man….
0
18
0
This is really cool! There’s a ton of places where a dynamic differentiable hierarchy makes a ton of sense. Awesome to see progress here!.
Tokenization is just a special case of "chunking" - building low-level data into high-level abstractions - which is in turn fundamental to intelligence. Our new architecture, which enables hierarchical *dynamic chunking*, is not only tokenizer-free, but simply scales better.
0
1
19
RT @_albertgu: Tokenization is just a special case of "chunking" - building low-level data into high-level abstractions - which is in turn….
0
181
0
RT @KumbongHermann: Happy to share that our HMAR code and pre-trained models are now publicly available. Please try them out here:. code: h….
0
11
0
Day zero support for Flux kontext dev on Chipmunk! Great work @austinsilveria!.
0
0
8
RT @austinsilveria: 🐿️ chipmunk ship!. flux kontext supported for up to 30% faster cute chipmunks!
0
2
0
Chipmunks for everyone!.
Chipmunks can now hop across multiple GPU architectures (sm_80, sm_89, sm_90). You can get a 1.4-3x lossless speedup when generating videos on A100s, 4090s, and H100s!. Chipmunks also play with more open-source models: Mochi, Wan, & others (w/ tutorials for integration) 🐿️
1
1
11
RT @SohamGovande: Chipmunks can now hop across multiple GPU architectures (sm_80, sm_89, sm_90). You can get a 1.4-3x lossless speedup when….
0
3
0