Tri Dao @tri_dao X Profile

Tri Dao

@tri_dao

Followers

35K

Following

2K

Media

54

Statuses

879

Asst. Prof @PrincetonCS, Chief Scientist @togethercompute. Machine learning & systems.

https://t.co/uFTGOmLPGP

Stanford, CA

Joined May 2012

Don't wanna be here? Send us removal request.

Tri Dao

@tri_dao

1 year

FlashAttention is widely used to accelerate Transformers, already making attention 4-8x faster, but has yet to take advantage of modern GPUs. We’re releasing FlashAttention-3: 1.5-2x faster on FP16, up to 740 TFLOPS on H100 (75% util), and FP8 gets close to 1.2 PFLOPS! 1/

29

341

2K

Simran Arora

@simran_s_arora

23 days

AI has been built on one vendor’s stack for too long. AMD’s GPUs now offer state-of-the-art peak compute and memory bandwidth — but the lack of mature software / the “CUDA moat” keeps that power locked away. Time to break it and ride into our multi-silicon future. 🌊 It's been a

13

97

581

Inception

@_inception_ai

28 days

Today’s LLMs are painfully slow and expensive. They are autoregressive and spit out words sequentially. One. At. A. Time. Our dLLMs generate text in parallel, delivering answers up to 10X faster. Now we’ve raised $50M to scale them. Full story from @russellbrandom in

techcrunch.com

Diffusion models already power AI image generators, but Inception thinks they can be even more powerful applied in software development.

16

48

436

Tri Dao

@tri_dao

28 days

Tons of effort from IBM and vLLM folks to make these hybrid models go fast. Thank you!

PyTorch

@PyTorch

28 days

Hybrid models like Qwen3-Next, Nemotron Nano 2 and Granite 4.0 are now fully supported in vLLM! Check out our latest blog from the vLLM team at IBM to learn how the vLLM community has elevated hybrid models from experimental hacks in V0 to first-class citizens in V1. 🔗

0

10

111

Tri Dao

@tri_dao

29 days

Thank you @schmidtsciences for the 2025 #AI2050 Early Career Fellowship supporting my work on self-improving AI systems: as AI gets better, it should help human experts design better model architectures and faster training & inference systems

Schmidt Sciences

@schmidtsciences

29 days

We're excited to welcome 28 new AI2050 Fellows! This 4th cohort of researchers are pursuing projects that include building AI scientists, designing trustworthy models, and improving biological and medical research, among other areas. https://t.co/8oY7xdhxvF

9

2

86

Princeton Computer Science

@PrincetonCS

29 days

Congratulations to @tri_dao and @ZhongingAlong on being named AI2050 Early Career Fellows by @schmidtsciences! The AI2050 fellowships fund researchers working to solve hard problems in AI and improve technology for the benefit of humanity by 2050. https://t.co/FSNfP2YFtE

1

15

104

Tri Dao

@tri_dao

1 month

State space architecture for state of the art voice model!

Karan Goel

@krandiash

1 month

We've raised $100M from Kleiner Perkins, Index Ventures, Lightspeed, and NVIDIA. Today we're introducing Sonic-3 - the state-of-the-art model for realtime conversation. What makes Sonic-3 great: - Breakthrough naturalness - laughter and full emotional range - Lightning fast -

6

17

296

Liane Galanti

@lianegalanti

1 month

Feels like a dream! I’ve recently started my Ph.D. in Computer Science @Princeton! Working on exciting research with Professors @HazanPrinceton and @tri_dao 🤩

109

96

2K

Eran Malach

@EranMalach

2 months

SSMs promised efficient language modeling for long context, but so far seem to underperform compared to Transformers in many settings. Our new work suggests that this is not a problem with SSMs, but with how we are currently using them. Arxiv: https://t.co/bCzxawF452 🧵

6

84

415

Tri Dao

@tri_dao

2 months

This work, led by @_junxiong_wang and @ben_athi, is a first step towards building AI systems that evolve and get better as you use them. More to come!

Together AI

@togethercompute

2 months

What if your LLM inference automatically got faster the more you used it? Introducing ATLAS from the Together AI Turbo research team. Read more: https://t.co/ASRNUpqoAE Here’s Together AI Founder and Chief Scientist @tri_dao introducing ATLAS:

3

36

301

Tri Dao

@tri_dao

3 months

Clarification: I was comparing A @ B + C here, where the cute-dsl version is quite good at overlapping the epilogue. On the standard matmul A @ B, cuBLAS is very good. Updated numbers here

SemiAnalysis

@SemiAnalysis_

3 months

Using CUTLASS CuTe-DSL, TogetherAI's Chief Scientist @tri_dao announced that he has written kernels that is 50% faster than NVIDIA's latest cuBLAS 13.0 library for small K reduction dim shapes on Blackwell during today's hotchip conference. His kernels beats cuBLAS by using 2

2

20

205

Mohammad Shoeybi

@MohammadShoeybi

4 months

We just released Nemotron Nano V2 with great accuracies and unprecedented inference speeds. With the goal of true open source models, we also released most of the data used to train this model. Check it out!

Bryan Catanzaro

@ctnzr

4 months

Today we're releasing NVIDIA Nemotron Nano v2 - a 9B hybrid SSM that is 6X faster than similarly sized models, while also being more accurate. Along with this model, we are also releasing most of the data we used to create it, including the pretraining corpus. Links to the

0

4

45

Bryan Catanzaro

@ctnzr

4 months

Today we're releasing NVIDIA Nemotron Nano v2 - a 9B hybrid SSM that is 6X faster than similarly sized models, while also being more accurate. Along with this model, we are also releasing most of the data we used to create it, including the pretraining corpus. Links to the

39

243

1K

Together AI

@togethercompute

4 months

🤖OpenAI's open models are here. gpt-oss models just landed on Together AI. Achieves near-parity with o4- mini, trained using o3 techniques. Build anything, deploy anywhere🔥

13

24

111

Hassan

@nutlope

4 months

OpenAI's new OSS models are now on Together Chat!

8

115

AI21 Labs

@AI21Labs

4 months

Attention was never enough. The hybrid LLM era is here—and it’s moving fast. From Mamba to Jamba to Bamba, we mapped every major model that’s challenged the Transformer default in the past 18 months. 🧵 A timeline of what’s changed and why it matters ↓ 🔗

12

99

472

Hunyuan

@TencentHunyuan

4 months

🚀We're expanding the Tencent Hunyuan open-source LLM ecosystem with four compact models (0.5B, 1.8B, 4B, 7B)! Designed for low-power scenarios like consumer-grade GPUs, smart vehicles, smart home devices, mobile phones, and PCs, these models support cost-effective fine-tuning

39

190

1K

Teortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞)

@teortaxesTex

4 months

Falcon-H1 is a very dense research paper exploring the space of hybrid attention designs and tuning *every* hyperparameter there is. It's more interesting than models themselves. If you were intrigued by that «AlphaGo move» slop, this is the real thing.

Rosinality

@rosinality

4 months

Falcon-H1: A Family of Hybrid-Head Language Models Redefining Efficiency and Performance Falcon's side-by-side attention-SSM hybrid model. Very detailed, from tokenizers to data preparation and optimization strategies.

1

12

67

Stas Bekman

@StasBekman

4 months

Llama-8b 1.2M sequence length training is now possible on a 1x H200 gpu with ALST + FA3 + Liger-Kernel. That's 2.4x longer than with 1x H100. Ready to run recipes: https://t.co/fZZ1IL83lJ For how this is possible see: https://t.co/rpJ3WPipSK To build FA3:

6

28

218

Princeton Computer Science

@PrincetonCS

4 months

⏱️AI is making verification process easier, with models verifying proofs in minutes. 💻 Now, @prfsanjeevarora, @chijinML, @danqi_chen and @PrincetonPLI have released Goedel Prover V2, a model more efficient and more accurate than any previous model. 👉 https://t.co/v7500VNytz

1

21

96

Together AI

@togethercompute

4 months

🧠 Qwen3 just leveled up on Together AI 🚀 Qwen3-235B-A22B-Instruct-2507-FP8 isn't just another model update - it's a leap forward 📈

3

12

59