Tian Jin @tjingrant X Profile

Tian Jin

@tjingrant

Followers

591

Following

340

Media

18

Statuses

218

PhD student @MIT_CSAIL, previously @IBMResearch, @haverfordedu .

https://t.co/ocMISWdlUU

Cambridge, Massachusetts

Joined March 2015

Don't wanna be here? Send us removal request.

Tian Jin

@tjingrant

8 months

Introducing Learned Asynchronous Decoding w/ friends from MIT/Google! LLM responses often have chunks of tokens that are semantically independent. We train LLMs to identify and decode them in parallel, speeding up inference by 1.46x geomean (AlpacaEval) w/ only 1.3% quality loss.

4

15

70

Ryan Hanrui Wang

@hanrui_w

11 days

Explore Eigen Banana, out post trained image edit model with lightning fast speed! ⚡️

Eigen AI

@Eigen_AI_Labs

11 days

🚀 Releasing open-source Eigen-Banana-Qwen-Image-Edit: 4 seconds ⚡ instruction-based image edits trained on Pico-Banana-400K. Super fast with high image editing quality. Open-source LoRA for Diffusers/DiffSynth-Studio + enterprise stack (EigenTrain/Inference/Deploy). Feel free

0

2

14

Dan Alistarh

@DAlistarh

15 days

Releasing QuTLASS v0.2: fast, end-to-end quantization-aware training (QAT) with kernel support and applications! 1. Nanochat-QAT: a fully-quantized extension of @karpathy 's nanochat 2. General QAT recipe with MXFP4 forward/MXFP8 backward GEMMs 3. Transformers/vLLM integrations

1

38

157

Tian Jin

@tjingrant

17 days

Amazing work!

X. Dong

@SimonXinDong

17 days

We, at NVIDIA, presents - Length Penalty Done Right - Cut CoT length by 3/4 without sacrificing accuracy using only RL - This makes DeepSeek-R1-7B running ~8 times faster on AIME-24 while maintaining the same accuracy.

0

2

马东锡 NLP

@dongxi_nlp

18 days

https://t.co/mGCLaiaPgR

Daniel Israel

@danielmisrael

21 days

"An hour of planning can save you 10 hours of doing." ✨📝 Planned Diffusion 📝 ✨ makes a plan before parallel dLLM generation. Planned Diffusion runs 1.2-1.8× faster than autoregressive and an order of magnitude faster than diffusion, while staying within 0.9–5% AR quality.

0

1

3

马东锡 NLP

@dongxi_nlp

18 days

异步生成 + 文本Diffusion+ 协议 token 规划! From awsome @tjingrant @danielmisrael

2

10

Ellie Cheng

@ellieyhc

21 days

Diffusion 🤝 Autoregressive Fast high-quality generation

Daniel Israel

@danielmisrael

21 days

"An hour of planning can save you 10 hours of doing." ✨📝 Planned Diffusion 📝 ✨ makes a plan before parallel dLLM generation. Planned Diffusion runs 1.2-1.8× faster than autoregressive and an order of magnitude faster than diffusion, while staying within 0.9–5% AR quality.

0

2

Tian Jin

@tjingrant

21 days

Plan autoregressively, denoise in parallel!

Daniel Israel

@danielmisrael

21 days

"An hour of planning can save you 10 hours of doing." ✨📝 Planned Diffusion 📝 ✨ makes a plan before parallel dLLM generation. Planned Diffusion runs 1.2-1.8× faster than autoregressive and an order of magnitude faster than diffusion, while staying within 0.9–5% AR quality.

0

2

5

Michael Carbin

@mcarbin

20 days

Earlier this year, we introduced the idea of learned asynchronous decoding. Now we've brought it to diffusion!

Daniel Israel

@danielmisrael

21 days

"An hour of planning can save you 10 hours of doing." ✨📝 Planned Diffusion 📝 ✨ makes a plan before parallel dLLM generation. Planned Diffusion runs 1.2-1.8× faster than autoregressive and an order of magnitude faster than diffusion, while staying within 0.9–5% AR quality.

0

3

15

Tian Jin

@tjingrant

21 days

Plan autoregressively, denoise in parallel!

Daniel Israel

@danielmisrael

21 days

"An hour of planning can save you 10 hours of doing." ✨📝 Planned Diffusion 📝 ✨ makes a plan before parallel dLLM generation. Planned Diffusion runs 1.2-1.8× faster than autoregressive and an order of magnitude faster than diffusion, while staying within 0.9–5% AR quality.

0

2

5

Daniel Israel

@danielmisrael

21 days

"An hour of planning can save you 10 hours of doing." ✨📝 Planned Diffusion 📝 ✨ makes a plan before parallel dLLM generation. Planned Diffusion runs 1.2-1.8× faster than autoregressive and an order of magnitude faster than diffusion, while staying within 0.9–5% AR quality.

7

46

311

X. Dong

@SimonXinDong

27 days

Super bullish on intra-layer hybridization LLM. These are the reasons why.

3

154

574

Nadav Timor

@NadavTimor

29 days

NYC open-source AI infra contributors — we’ve launched a community research hub above Grand Central where GPUs go brrr 🔥🗽 A place to hack, benchmark, and collaborate — vLLM, SGLang, kernels, inference optimizations all welcome. Open space. Open source. Weekends too. Huge

7

10

89

Ziniu Li

@ZiniuLi

1 month

🚀 Excited to share our work at Bytedance Seed! Knapsack RL: Unlocking Exploration of LLMs via Budget Allocation 🎒 Exploration in LLM training is crucial but expensive. Uniform rollout allocation is wasteful: ✅ Easy tasks → always solved → 0 gradient ❌ Hard tasks →

13

102

642

Dan Alistarh

@DAlistarh

1 month

Introducing LLM.Q: Quantized LLM training in pure CUDA/C++! With LLM.Q, you can train your own LLM on consumer GPUs with natively quantized matmuls, on single workstations. No datacenter required. Inspired by @karpathy's llm.c, but natively quantized.

3

16

141

Daniel Israel

@danielmisrael

2 months

🔦Adaptive Parallel Decoding (APD) has been accepted as a spotlight paper at @NeurIPSConf ! I thank my collaborators, reviewers, and program organizers for this honor. A thread for those interested 🧵 (1/n)

11

23

170

Tian Jin

@tjingrant

2 months

Congrats Xinyu!

Xinyu Yang

@Xinyu2ML

2 months

🚀 Excited to share that #Multiverse has been accepted to #NeurIPS 2025! Couldn’t have done it without such incredible collaborators—thank you!!

0

1

Anne Ouyang

@anneouyang

2 months

Excited to share what friends and I have been working on at @Standard_Kernel We've raised from General Catalyst (@generalcatalyst), Felicis (@felicis), and a group of exceptional angels. We have some great H100 BF16 kernels in pure CUDA+PTX, featuring: - Matmul 102%-105% perf

52

92

993

Dan Alistarh

@DAlistarh

2 months

🚀 Excited to announce QuTLASS v0.1.0 🎉 QuTLASS is a high-performance library for low-precision deep learning kernels, following NVIDIA CUTLASS. The new release brings 4-bit NVFP4 microscaling and fast transforms to NVIDIA Blackwell GPUs (including the B200!) [1/N]

3

35

220

Alex L Zhang

@a1zhang

2 months

All the recordings for the @GPU_MODE x @scaleml series are up as a playlist in case you missed it 😁 There's so much value in these ~8 hours of lectures, from proving quantization error bounds on a whiteboard to a deep-dive into GPU warp schedulers! Plz take advantage of it!

7

105

649

Alex L Zhang

@a1zhang

3 months

announcing the @GPU_MODE x @scaleml summer speaker series happening next week, a 5⃣-day series where top researchers will teach about the algorithmic and systems-level advances that underpin `gpt-oss`! all content will be live-streamed & recorded for FREE on GPU MODE's YouTube!

1

45

270