Dan Fu Profile
Dan Fu

@realDanFu

Followers
7K
Following
1K
Media
190
Statuses
813

VP, Kernels @togethercompute Assistant Professor @ucsd_cse Looking for talented kernel engineers and performance engineers!

Joined September 2019
Don't wanna be here? Send us removal request.
@realDanFu
Dan Fu
1 year
Excited to share that I will be joining UCSD CSE as an assistant professor in January 2026! I'll be recruiting PhD students from the 2024 application pool - if you're interested in anything ML Sys/efficiency/etc please reach out & put my name on your application! Until then
47
40
573
@realDanFu
Dan Fu
10 hours
It's pretty exciting to see this model on the platform - open-source models starting to match / outperform the SoTA frontier models!
@togethercompute
Together AI
10 hours
🚀We're going live with @Kimi_Moonshot on Nov 19 for a technical deep dive on Kimi K2 Thinking Learn about the 1T parameter MoE that allows your AI agent to make 300 tool calls in one run. Register: https://t.co/JV3eIGqBtZ
1
0
6
@togethercompute
Together AI
10 hours
🚀We're going live with @Kimi_Moonshot on Nov 19 for a technical deep dive on Kimi K2 Thinking Learn about the 1T parameter MoE that allows your AI agent to make 300 tool calls in one run. Register: https://t.co/JV3eIGqBtZ
2
3
16
@togethercompute
Together AI
20 days
🎬 Video generation is now available on Together AI Through our partnership with @Runware, we're integrating 20+ video models (Sora 2, Veo 3, PixVerse V5, Seedance) and 15+ image models All through the same APIs you use for text inference →
4
5
24
@realDanFu
Dan Fu
1 month
Great work on self-adaptive speculators led by @ben_athi and team! Very exciting!
@ben_athi
Ben Athiwaratkun
1 month
Most speculative decoding research focuses on algorithms. But we know that data matters a ton! (e.g. no matter how good the spec algorithm is, if it's trained on bad & misaligned data, the speed will be poor) What if we build on algorithms that make data really shine?! In
0
0
7
@ben_athi
Ben Athiwaratkun
1 month
Most speculative decoding research focuses on algorithms. But we know that data matters a ton! (e.g. no matter how good the spec algorithm is, if it's trained on bad & misaligned data, the speed will be poor) What if we build on algorithms that make data really shine?! In
Tweet card summary image
together.ai
LLM inference that gets faster as you use it. Our runtime-learning accelerator adapts continuously to your workload, delivering 500 TPS on DeepSeek-V3.1, a 4x speedup over baseline performance...
@tri_dao
Tri Dao
1 month
This work, led by @_junxiong_wang and @ben_athi, is a first step towards building AI systems that evolve and get better as you use them. More to come!
0
5
25
@realDanFu
Dan Fu
1 month
Congrats @MichaelPoli6 @Massastrello @athmsx @exnx on the release! Building really cool things over at @RadicalNumerics :)
@RadicalNumerics
Radical Numerics
1 month
Introducing RND1, the most powerful base diffusion language model (DLM) to date. RND1 (Radical Numerics Diffusion) is an experimental DLM with 30B params (3B active) with a sparse MoE architecture. We are making it open source, releasing weights, training details, and code to
1
0
10
@realDanFu
Dan Fu
1 month
Cartridges from @EyubogluSabri is one of the more creative new approaches to modeling I’ve seen in a while - great work and great profile from @StanfordHAI!
@EyubogluSabri
Sabri Eyuboglu
1 month
@StanfordHAI just ran this story on self-study and cartridges -- it's a really nice overview for those curious about our work
0
2
10
@EyubogluSabri
Sabri Eyuboglu
1 month
@StanfordHAI just ran this story on self-study and cartridges -- it's a really nice overview for those curious about our work
1
19
45
@realDanFu
Dan Fu
2 months
Grafting is accepted to #NeurIPS2025 as an Oral! New methods for converting a trained diffusion transformer into a new architecture (like Hyena, SSM, etc). Really top-notch work by @keshigeyan on this one. Check out his post below for demos, analysis, and models!
@keshigeyan
Keshigeyan Chandrasegaran
2 months
Grafting Diffusion Transformers accepted to #NeurIPS2025 as an Oral! We have lots of interesting analysis, a test bed for model grafting, and insights🚀 📄Paper: https://t.co/OjsrOZi7in 🌎Website:
0
2
18
@keshigeyan
Keshigeyan Chandrasegaran
2 months
Grafting Diffusion Transformers accepted to #NeurIPS2025 as an Oral! We have lots of interesting analysis, a test bed for model grafting, and insights🚀 📄Paper: https://t.co/OjsrOZi7in 🌎Website:
Tweet card summary image
arxiv.org
Designing model architectures requires decisions such as selecting operators (e.g., attention, convolution) and configurations (e.g., depth, width). However, evaluating the impact of these...
@keshigeyan
Keshigeyan Chandrasegaran
5 months
1/ Model architectures have been mostly treated as fixed post-training. 🌱 Introducing Grafting: A new way to edit pretrained diffusion transformers, allowing us to customize architectural designs on a small compute budget. 🌎 https://t.co/fjOTVqfVZr Co-led with @MichaelPoli6
7
36
206
@realDanFu
Dan Fu
3 months
It's really exciting to see @OpenAI releasing open-source models again. These models look really great, excited to see what we can do with them! 120B available now on @togethercompute, more coming soon!
@togethercompute
Together AI
3 months
🤖OpenAI's open models are here. gpt-oss models just landed on Together AI. Achieves near-parity with o4- mini, trained using o3 techniques. Build anything, deploy anywhere🔥
0
0
15
@togethercompute
Together AI
3 months
🤖OpenAI's open models are here. gpt-oss models just landed on Together AI. Achieves near-parity with o4- mini, trained using o3 techniques. Build anything, deploy anywhere🔥
13
24
112
@realDanFu
Dan Fu
3 months
Crazy fast!! Great work from @haoailab
@haoailab
Hao AI Lab
3 months
(1/n) 🚀 With FastVideo, you can now generate a 5-second video in 5 seconds on a single H200 GPU! Introducing FastWan series, a family of fast video generation models trained via a new recipe we term as “sparse distillation”, to speed up video denoising time by 70X! 🖥️ Live
0
1
4
@realDanFu
Dan Fu
3 months
DeepCogito models available and scaling on Together, check them out
@drishanarora
Drishan Arora
3 months
A small update - we had more traffic than anticipated. However, the endpoints are now scalable on Together AI for all models, including the 671B MoE. Test out the model here: https://t.co/Od1NXYVBxU (A huge thanks to the folks at @togethercompute for making this happen so
0
0
3
@realDanFu
Dan Fu
3 months
It’s been great working with you @mjlbach! Great models + great kernels & infra -> amazing things
@mjlbach
Michael Lingelbach
3 months
Working with @togethercompute has been one of the greatest accelerants for our research & inference team. We've scaled to thousands of chips seamlessly and migrated across multiple architectures thanks to their amazing kernel group. @vipulved is also a personal icon of mine.
0
0
2
@realDanFu
Dan Fu
4 months
I really enjoyed this talk from @bariskasikci at @ESFoMo - some really fine-grained analysis of compute patterns of LLM serving in the throughput-bound regime, and how to schedule operations to push the boundaries (a linear program)! Great work!
@ESFoMo
ES-FoMo@ICML2025
4 months
@wanchao_ Next we have @bariskasikci with a talk on the quest for blazingly fast LLM inference!
0
3
15
@realDanFu
Dan Fu
4 months
ES-FoMo is back tomorrow! Come join is in East Exhibition Hall A bright and early at 8:30AM for a great slate of invited talks, orals, spotlight lightning talks, and 150 posters!
@ESFoMo
ES-FoMo@ICML2025
4 months
Looking forward to seeing everyone for ES-FoMo part three tomorrow! We'll be in East Exhibition Hall A (the big one), and we've got an exciting schedule of invited talks, orals, and posters planned for you tomorrow. Let's meet some of our great speakers! 1/
0
3
14
@ESFoMo
ES-FoMo@ICML2025
4 months
Looking forward to seeing everyone for ES-FoMo part three tomorrow! We'll be in East Exhibition Hall A (the big one), and we've got an exciting schedule of invited talks, orals, and posters planned for you tomorrow. Let's meet some of our great speakers! 1/
3
23
80
@realDanFu
Dan Fu
4 months
And @keshigeyan is going to be presenting about Grafting - a great collaboration with @MichaelPoli6 on how to distill pretrained diffusion models into new architectures (Transformers -> Hyenas) 4/
@keshigeyan
Keshigeyan Chandrasegaran
5 months
1/ Model architectures have been mostly treated as fixed post-training. 🌱 Introducing Grafting: A new way to edit pretrained diffusion transformers, allowing us to customize architectural designs on a small compute budget. 🌎 https://t.co/fjOTVqfVZr Co-led with @MichaelPoli6
0
3
7
@realDanFu
Dan Fu
4 months
Two papers at the workshop I’m a bit fond of… @austinsilveria and @SohamGovande are going to be presenting Chipmunk - come chat with them about how they made video diffusion 3.7x faster! (With custom column-sparse attention kernels) 3/
@austinsilveria
Austin Silveria
7 months
Training-free acceleration of Diffusion Transformers with dynamic sparsity and cross-step attention/MLP deltas--collaboration with @SohamGovande and @realDanFu! ⚡️ 3.7x faster video and 1.6x faster image generation while preserving quality! 🧵 Open-source code & CUDA kernels!
1
2
10