Dan Fu Profile
Dan Fu

@realDanFu

Followers
6K
Following
1K
Media
190
Statuses
786

Incoming assistant professor at UCSD CSE in MLSys. Currently recruiting students! Also running the kernels team @togethercompute.

Joined September 2019
Don't wanna be here? Send us removal request.
@realDanFu
Dan Fu
11 months
Excited to share that I will be joining UCSD CSE as an assistant professor in January 2026!. I'll be recruiting PhD students from the 2024 application pool - if you're interested in anything ML Sys/efficiency/etc please reach out & put my name on your application!. Until then.
47
39
571
@realDanFu
Dan Fu
5 hours
I really enjoyed this talk from @bariskasikci at @ESFoMo - some really fine-grained analysis of compute patterns of LLM serving in the throughput-bound regime, and how to schedule operations to push the boundaries (a linear program)!. Great work!
Tweet media one
Tweet media two
Tweet media three
@ESFoMo
ES-FoMo@ICML2025
6 hours
@wanchao_ Next we have @bariskasikci with a talk on the quest for blazingly fast LLM inference!
Tweet media one
0
1
9
@realDanFu
Dan Fu
1 day
ES-FoMo is back tomorrow! Come join is in East Exhibition Hall A bright and early at 8:30AM for a great slate of invited talks, orals, spotlight lightning talks, and 150 posters!.
@ESFoMo
ES-FoMo@ICML2025
1 day
Looking forward to seeing everyone for ES-FoMo part three tomorrow! We'll be in East Exhibition Hall A (the big one), and we've got an exciting schedule of invited talks, orals, and posters planned for you tomorrow. Let's meet some of our great speakers! 1/
Tweet media one
0
2
14
@realDanFu
Dan Fu
1 day
RT @ESFoMo: Looking forward to seeing everyone for ES-FoMo part three tomorrow! We'll be in East Exhibition Hall A (the big one), and we've….
0
20
0
@realDanFu
Dan Fu
2 days
And @keshigeyan is going to be presenting about Grafting - a great collaboration with @MichaelPoli6 on how to distill pretrained diffusion models into new architectures (Transformers -> Hyenas). 4/.
@keshigeyan
Keshigeyan Chandrasegaran
1 month
1/ Model architectures have been mostly treated as fixed post-training. 🌱 Introducing Grafting: A new way to edit pretrained diffusion transformers, allowing us to customize architectural designs on a small compute budget. 🌎 Co-led with @MichaelPoli6
0
3
7
@realDanFu
Dan Fu
2 days
Two papers at the workshop I’m a bit fond of…. @austinsilveria and @SohamGovande are going to be presenting Chipmunk - come chat with them about how they made video diffusion 3.7x faster! (With custom column-sparse attention kernels). 3/.
@austinsilveria
Austin Silveria
3 months
Training-free acceleration of Diffusion Transformers with dynamic sparsity and cross-step attention/MLP deltas--collaboration with @SohamGovande and @realDanFu!. ⚡️ 3.7x faster video and 1.6x faster image generation while preserving quality!. 🧵 Open-source code & CUDA kernels!
1
2
9
@realDanFu
Dan Fu
2 days
On Saturday we’re hosting the ES-FoMo workshop, with @tri_dao, @dan_biderman, @simran_s_arora, @m_ryabinin and others - we’ve got a great slate of papers and invited talks, come join us! (More on the great slate of speakers soon). 2/.
@ESFoMo
ES-FoMo@ICML2025
2 months
ES-FoMo is back for round three at #ICML2025! Join us in Vancouver on Saturday July 19 for a day dedicated to Efficient Systems for Foundation Models: from 💬reasoning models to🖼️scalable multimodality, 🧱efficient architectures, and more!. Submissions due May 26! More below 👇.
1
3
13
@realDanFu
Dan Fu
2 days
I’m off to #ICML2025 in Vancouver! . (After an unusually eventful first flight - our plane had a wing problem, so we had to take an emergency landing back to SFO & switch planes). Reach out if you’d like to chat about (mega)kernels, @togethercompute, or anything MLSys!. 1/
Tweet media one
2
0
22
@realDanFu
Dan Fu
2 days
Fastest Deepseek! Super proud of the amazing inference team at Together for pulling this off!.
@togethercompute
Together AI
2 days
Together AI Sets a New Bar: Fastest Inference for DeepSeek-R1-0528. We’ve upgraded the Together Inference Engine to run on @NVIDIA Blackwell GPUs—and the results speak for themselves:.📈 Highest known serverless throughput: 334 tokens/sec.🏃‍Fastest time to first answer token:
Tweet media one
0
0
7
@realDanFu
Dan Fu
2 days
RT @togethercompute: Together AI Sets a New Bar: Fastest Inference for DeepSeek-R1-0528. We’ve upgraded the Together Inference Engine to ru….
0
14
0
@realDanFu
Dan Fu
5 days
Synthetics like associative recall, MQAR are a great guide to building models. Excited to see this work from @nick11roberts to create new LMs!.
@nick11roberts
Nicholas Roberts
5 days
🎉 Excited to share that our paper "Pretrained Hybrids with MAD Skills" was accepted to @COLM_conf 2025!.We introduce Manticore - a framework for automatically creating hybrid LMs from pretrained models without training from scratch. 🧵[1/n]
0
1
12
@realDanFu
Dan Fu
5 days
RT @nick11roberts: 🎉 Excited to share that our paper "Pretrained Hybrids with MAD Skills" was accepted to @COLM_conf 2025!.We introduce Man….
0
18
0
@realDanFu
Dan Fu
8 days
This is really cool! There’s a ton of places where a dynamic differentiable hierarchy makes a ton of sense. Awesome to see progress here!.
@_albertgu
Albert Gu
8 days
Tokenization is just a special case of "chunking" - building low-level data into high-level abstractions - which is in turn fundamental to intelligence. Our new architecture, which enables hierarchical *dynamic chunking*, is not only tokenizer-free, but simply scales better.
Tweet media one
0
1
19
@realDanFu
Dan Fu
8 days
RT @_albertgu: Tokenization is just a special case of "chunking" - building low-level data into high-level abstractions - which is in turn….
0
181
0
@realDanFu
Dan Fu
11 days
HMAR code and models are out!.
@KumbongHermann
Hermann
11 days
Happy to share that our HMAR code and pre-trained models are now publicly available. Please try them out here:. code: checkpoints:
0
0
8
@realDanFu
Dan Fu
11 days
RT @KumbongHermann: Happy to share that our HMAR code and pre-trained models are now publicly available. Please try them out here:. code: h….
0
11
0
@realDanFu
Dan Fu
23 days
Day zero support for Flux kontext dev on Chipmunk! Great work @austinsilveria!.
@austinsilveria
Austin Silveria
23 days
🐿️ chipmunk ship!. flux kontext supported for up to 30% faster cute chipmunks!
Tweet media one
Tweet media two
0
0
8
@realDanFu
Dan Fu
23 days
RT @austinsilveria: 🐿️ chipmunk ship!. flux kontext supported for up to 30% faster cute chipmunks!
Tweet media one
Tweet media two
0
2
0
@realDanFu
Dan Fu
25 days
What a throwback to weak supervision! Great work @JonSaadFalcon @ekellbuch @MayeeChen!.
@JonSaadFalcon
Jon Saad-Falcon
25 days
How can we close the generation-verification gap when LLMs produce correct answers but fail to select them? .🧵 Introducing Weaver: a framework that combines multiple weak verifiers (reward models + LM judges) to achieve o3-mini-level accuracy with much cheaper non-reasoning
Tweet media one
1
7
24
@realDanFu
Dan Fu
1 month
Chipmunks for everyone!.
@SohamGovande
soham
1 month
Chipmunks can now hop across multiple GPU architectures (sm_80, sm_89, sm_90). You can get a 1.4-3x lossless speedup when generating videos on A100s, 4090s, and H100s!. Chipmunks also play with more open-source models: Mochi, Wan, & others (w/ tutorials for integration) 🐿️
Tweet media one
1
1
11
@realDanFu
Dan Fu
1 month
RT @SohamGovande: Chipmunks can now hop across multiple GPU architectures (sm_80, sm_89, sm_90). You can get a 1.4-3x lossless speedup when….
0
3
0