Dan Fu Profile
Dan Fu

@realDanFu

Followers
6K
Following
1K
Media
190
Statuses
797

Incoming assistant professor at UCSD CSE in MLSys. Currently recruiting students! Also running the kernels team @togethercompute.

Joined September 2019
Don't wanna be here? Send us removal request.
@realDanFu
Dan Fu
1 year
Excited to share that I will be joining UCSD CSE as an assistant professor in January 2026! I'll be recruiting PhD students from the 2024 application pool - if you're interested in anything ML Sys/efficiency/etc please reach out & put my name on your application! Until then
47
40
572
@realDanFu
Dan Fu
1 month
It's really exciting to see @OpenAI releasing open-source models again. These models look really great, excited to see what we can do with them! 120B available now on @togethercompute, more coming soon!
@togethercompute
Together AI
1 month
🤖OpenAI's open models are here. gpt-oss models just landed on Together AI. Achieves near-parity with o4- mini, trained using o3 techniques. Build anything, deploy anywhere🔥
0
0
15
@togethercompute
Together AI
1 month
🤖OpenAI's open models are here. gpt-oss models just landed on Together AI. Achieves near-parity with o4- mini, trained using o3 techniques. Build anything, deploy anywhere🔥
13
24
111
@realDanFu
Dan Fu
2 months
Crazy fast!! Great work from @haoailab
@haoailab
Hao AI Lab
2 months
(1/n) 🚀 With FastVideo, you can now generate a 5-second video in 5 seconds on a single H200 GPU! Introducing FastWan series, a family of fast video generation models trained via a new recipe we term as “sparse distillation”, to speed up video denoising time by 70X! 🖥️ Live
0
1
4
@realDanFu
Dan Fu
2 months
DeepCogito models available and scaling on Together, check them out
@drishanarora
Drishan Arora
2 months
A small update - we had more traffic than anticipated. However, the endpoints are now scalable on Together AI for all models, including the 671B MoE. Test out the model here: https://t.co/Od1NXYVBxU (A huge thanks to the folks at @togethercompute for making this happen so
0
0
3
@realDanFu
Dan Fu
2 months
It’s been great working with you @mjlbach! Great models + great kernels & infra -> amazing things
@mjlbach
Michael Lingelbach
2 months
Working with @togethercompute has been one of the greatest accelerants for our research & inference team. We've scaled to thousands of chips seamlessly and migrated across multiple architectures thanks to their amazing kernel group. @vipulved is also a personal icon of mine.
0
0
2
@realDanFu
Dan Fu
2 months
I really enjoyed this talk from @bariskasikci at @ESFoMo - some really fine-grained analysis of compute patterns of LLM serving in the throughput-bound regime, and how to schedule operations to push the boundaries (a linear program)! Great work!
@ESFoMo
ES-FoMo@ICML2025
2 months
@wanchao_ Next we have @bariskasikci with a talk on the quest for blazingly fast LLM inference!
0
3
15
@realDanFu
Dan Fu
2 months
ES-FoMo is back tomorrow! Come join is in East Exhibition Hall A bright and early at 8:30AM for a great slate of invited talks, orals, spotlight lightning talks, and 150 posters!
@ESFoMo
ES-FoMo@ICML2025
2 months
Looking forward to seeing everyone for ES-FoMo part three tomorrow! We'll be in East Exhibition Hall A (the big one), and we've got an exciting schedule of invited talks, orals, and posters planned for you tomorrow. Let's meet some of our great speakers! 1/
0
2
14
@ESFoMo
ES-FoMo@ICML2025
2 months
Looking forward to seeing everyone for ES-FoMo part three tomorrow! We'll be in East Exhibition Hall A (the big one), and we've got an exciting schedule of invited talks, orals, and posters planned for you tomorrow. Let's meet some of our great speakers! 1/
3
22
81
@realDanFu
Dan Fu
2 months
And @keshigeyan is going to be presenting about Grafting - a great collaboration with @MichaelPoli6 on how to distill pretrained diffusion models into new architectures (Transformers -> Hyenas) 4/
@keshigeyan
Keshigeyan Chandrasegaran
3 months
1/ Model architectures have been mostly treated as fixed post-training. 🌱 Introducing Grafting: A new way to edit pretrained diffusion transformers, allowing us to customize architectural designs on a small compute budget. 🌎 https://t.co/fjOTVqfVZr Co-led with @MichaelPoli6
0
3
7
@realDanFu
Dan Fu
2 months
Two papers at the workshop I’m a bit fond of… @austinsilveria and @SohamGovande are going to be presenting Chipmunk - come chat with them about how they made video diffusion 3.7x faster! (With custom column-sparse attention kernels) 3/
@austinsilveria
Austin Silveria
5 months
Training-free acceleration of Diffusion Transformers with dynamic sparsity and cross-step attention/MLP deltas--collaboration with @SohamGovande and @realDanFu! ⚡️ 3.7x faster video and 1.6x faster image generation while preserving quality! 🧵 Open-source code & CUDA kernels!
1
2
10
@realDanFu
Dan Fu
2 months
On Saturday we’re hosting the ES-FoMo workshop, with @tri_dao, @dan_biderman, @simran_s_arora, @m_ryabinin and others - we’ve got a great slate of papers and invited talks, come join us! (More on the great slate of speakers soon) https://t.co/w2nhjqNxPb 2/
@ESFoMo
ES-FoMo@ICML2025
4 months
ES-FoMo is back for round three at #ICML2025! Join us in Vancouver on Saturday July 19 for a day dedicated to Efficient Systems for Foundation Models: from 💬reasoning models to🖼️scalable multimodality, 🧱efficient architectures, and more! Submissions due May 26! More below 👇
1
3
15
@realDanFu
Dan Fu
2 months
I’m off to #ICML2025 in Vancouver! (After an unusually eventful first flight - our plane had a wing problem, so we had to take an emergency landing back to SFO & switch planes) Reach out if you’d like to chat about (mega)kernels, @togethercompute, or anything MLSys! 1/
2
0
22
@realDanFu
Dan Fu
2 months
Fastest Deepseek! Super proud of the amazing inference team at Together for pulling this off!
@togethercompute
Together AI
2 months
Together AI Sets a New Bar: Fastest Inference for DeepSeek-R1-0528 We’ve upgraded the Together Inference Engine to run on @NVIDIA Blackwell GPUs—and the results speak for themselves: 📈 Highest known serverless throughput: 334 tokens/sec 🏃‍Fastest time to first answer token:
0
1
7
@togethercompute
Together AI
2 months
Together AI Sets a New Bar: Fastest Inference for DeepSeek-R1-0528 We’ve upgraded the Together Inference Engine to run on @NVIDIA Blackwell GPUs—and the results speak for themselves: 📈 Highest known serverless throughput: 334 tokens/sec 🏃‍Fastest time to first answer token:
7
14
106
@realDanFu
Dan Fu
2 months
Synthetics like associative recall, MQAR are a great guide to building models. Excited to see this work from @nick11roberts to create new LMs!
@nick11roberts
Nicholas Roberts
2 months
🎉 Excited to share that our paper "Pretrained Hybrids with MAD Skills" was accepted to @COLM_conf 2025! We introduce Manticore - a framework for automatically creating hybrid LMs from pretrained models without training from scratch. 🧵[1/n]
0
2
13
@nick11roberts
Nicholas Roberts
2 months
🎉 Excited to share that our paper "Pretrained Hybrids with MAD Skills" was accepted to @COLM_conf 2025! We introduce Manticore - a framework for automatically creating hybrid LMs from pretrained models without training from scratch. 🧵[1/n]
1
18
44
@realDanFu
Dan Fu
2 months
This is really cool! There’s a ton of places where a dynamic differentiable hierarchy makes a ton of sense. Awesome to see progress here!
@_albertgu
Albert Gu
2 months
Tokenization is just a special case of "chunking" - building low-level data into high-level abstractions - which is in turn fundamental to intelligence. Our new architecture, which enables hierarchical *dynamic chunking*, is not only tokenizer-free, but simply scales better.
0
1
19
@_albertgu
Albert Gu
2 months
Tokenization is just a special case of "chunking" - building low-level data into high-level abstractions - which is in turn fundamental to intelligence. Our new architecture, which enables hierarchical *dynamic chunking*, is not only tokenizer-free, but simply scales better.
@sukjun_hwang
Sukjun (June) Hwang
2 months
Tokenization has been the final barrier to truly end-to-end language models. We developed the H-Net: a hierarchical network that replaces tokenization with a dynamic chunking process directly inside the model, automatically discovering and operating over meaningful units of data
61
192
1K
@realDanFu
Dan Fu
2 months
HMAR code and models are out!
@KumbongHermann
Hermann
2 months
Happy to share that our HMAR code and pre-trained models are now publicly available. Please try them out here: code: https://t.co/HZloGGrLFG checkpoints:
0
0
8
@KumbongHermann
Hermann
2 months
Happy to share that our HMAR code and pre-trained models are now publicly available. Please try them out here: code: https://t.co/HZloGGrLFG checkpoints:
Tweet card summary image
huggingface.co
@KumbongHermann
Hermann
3 months
Excited to be presenting our new work–HMAR: Efficient Hierarchical Masked Auto-Regressive Image Generation– at #CVPR2025 this week. VAR (Visual Autoregressive Modelling) introduced a very nice way to formulate autoregressive image generation as a next-scale prediction task (from
0
11
39