Tian Jin Profile
Tian Jin

@tjingrant

Followers
591
Following
340
Media
18
Statuses
218

PhD student @MIT_CSAIL, previously @IBMResearch, @haverfordedu .

Cambridge, Massachusetts
Joined March 2015
Don't wanna be here? Send us removal request.
@tjingrant
Tian Jin
8 months
Introducing Learned Asynchronous Decoding w/ friends from MIT/Google! LLM responses often have chunks of tokens that are semantically independent. We train LLMs to identify and decode them in parallel, speeding up inference by 1.46x geomean (AlpacaEval) w/ only 1.3% quality loss.
4
15
70
@hanrui_w
Ryan Hanrui Wang
11 days
Explore Eigen Banana, out post trained image edit model with lightning fast speed! โšก๏ธ
@Eigen_AI_Labs
Eigen AI
11 days
๐Ÿš€ Releasing open-source Eigen-Banana-Qwen-Image-Edit: 4 seconds โšก instruction-based image edits trained on Pico-Banana-400K. Super fast with high image editing quality. Open-source LoRA for Diffusers/DiffSynth-Studio + enterprise stack (EigenTrain/Inference/Deploy). Feel free
0
2
14
@DAlistarh
Dan Alistarh
15 days
Releasing QuTLASS v0.2: fast, end-to-end quantization-aware training (QAT) with kernel support and applications! 1. Nanochat-QAT: a fully-quantized extension of @karpathy 's nanochat 2. General QAT recipe with MXFP4 forward/MXFP8 backward GEMMs 3. Transformers/vLLM integrations
1
38
157
@tjingrant
Tian Jin
17 days
Amazing work!
@SimonXinDong
X. Dong
17 days
We, at NVIDIA, presents - Length Penalty Done Right - Cut CoT length by 3/4 without sacrificing accuracy using only RL - This makes DeepSeek-R1-7B running ~8 times faster on AIME-24 while maintaining the same accuracy.
0
0
2
@dongxi_nlp
้ฉฌไธœ้”ก NLP
18 days
@danielmisrael
Daniel Israel
21 days
"An hour of planning can save you 10 hours of doing." โœจ๐Ÿ“ Planned Diffusion ๐Ÿ“ โœจ makes a plan before parallel dLLM generation. Planned Diffusion runs 1.2-1.8ร— faster than autoregressive and an order of magnitude faster than diffusion, while staying within 0.9โ€“5% AR quality.
0
1
3
@dongxi_nlp
้ฉฌไธœ้”ก NLP
18 days
ๅผ‚ๆญฅ็”Ÿๆˆ + ๆ–‡ๆœฌDiffusion+ ๅ่ฎฎ token ่ง„ๅˆ’! From awsome @tjingrant @danielmisrael
2
2
10
@ellieyhc
Ellie Cheng
21 days
Diffusion ๐Ÿค Autoregressive Fast high-quality generation
@danielmisrael
Daniel Israel
21 days
"An hour of planning can save you 10 hours of doing." โœจ๐Ÿ“ Planned Diffusion ๐Ÿ“ โœจ makes a plan before parallel dLLM generation. Planned Diffusion runs 1.2-1.8ร— faster than autoregressive and an order of magnitude faster than diffusion, while staying within 0.9โ€“5% AR quality.
0
2
2
@tjingrant
Tian Jin
21 days
Plan autoregressively, denoise in parallel!
@danielmisrael
Daniel Israel
21 days
"An hour of planning can save you 10 hours of doing." โœจ๐Ÿ“ Planned Diffusion ๐Ÿ“ โœจ makes a plan before parallel dLLM generation. Planned Diffusion runs 1.2-1.8ร— faster than autoregressive and an order of magnitude faster than diffusion, while staying within 0.9โ€“5% AR quality.
0
2
5
@mcarbin
Michael Carbin
20 days
Earlier this year, we introduced the idea of learned asynchronous decoding. Now we've brought it to diffusion!
@danielmisrael
Daniel Israel
21 days
"An hour of planning can save you 10 hours of doing." โœจ๐Ÿ“ Planned Diffusion ๐Ÿ“ โœจ makes a plan before parallel dLLM generation. Planned Diffusion runs 1.2-1.8ร— faster than autoregressive and an order of magnitude faster than diffusion, while staying within 0.9โ€“5% AR quality.
0
3
15
@tjingrant
Tian Jin
21 days
Plan autoregressively, denoise in parallel!
@danielmisrael
Daniel Israel
21 days
"An hour of planning can save you 10 hours of doing." โœจ๐Ÿ“ Planned Diffusion ๐Ÿ“ โœจ makes a plan before parallel dLLM generation. Planned Diffusion runs 1.2-1.8ร— faster than autoregressive and an order of magnitude faster than diffusion, while staying within 0.9โ€“5% AR quality.
0
2
5
@danielmisrael
Daniel Israel
21 days
"An hour of planning can save you 10 hours of doing." โœจ๐Ÿ“ Planned Diffusion ๐Ÿ“ โœจ makes a plan before parallel dLLM generation. Planned Diffusion runs 1.2-1.8ร— faster than autoregressive and an order of magnitude faster than diffusion, while staying within 0.9โ€“5% AR quality.
7
46
311
@SimonXinDong
X. Dong
27 days
Super bullish on intra-layer hybridization LLM. These are the reasons why.
3
154
574
@NadavTimor
Nadav Timor
29 days
NYC open-source AI infra contributors โ€” weโ€™ve launched a community research hub above Grand Central where GPUs go brrr ๐Ÿ”ฅ๐Ÿ—ฝ A place to hack, benchmark, and collaborate โ€” vLLM, SGLang, kernels, inference optimizations all welcome. Open space. Open source. Weekends too. Huge
7
10
89
@ZiniuLi
Ziniu Li
1 month
๐Ÿš€ Excited to share our work at Bytedance Seed! Knapsack RL: Unlocking Exploration of LLMs via Budget Allocation ๐ŸŽ’ Exploration in LLM training is crucial but expensive. Uniform rollout allocation is wasteful: โœ… Easy tasks โ†’ always solved โ†’ 0 gradient โŒ Hard tasks โ†’
13
102
642
@DAlistarh
Dan Alistarh
1 month
Introducing LLM.Q: Quantized LLM training in pure CUDA/C++! With LLM.Q, you can train your own LLM on consumer GPUs with natively quantized matmuls, on single workstations. No datacenter required. Inspired by @karpathy's llm.c, but natively quantized.
3
16
141
@danielmisrael
Daniel Israel
2 months
๐Ÿ”ฆAdaptive Parallel Decoding (APD) has been accepted as a spotlight paper at @NeurIPSConf ! I thank my collaborators, reviewers, and program organizers for this honor. A thread for those interested ๐Ÿงต (1/n)
11
23
170
@tjingrant
Tian Jin
2 months
Congrats Xinyu!
@Xinyu2ML
Xinyu Yang
2 months
๐Ÿš€ Excited to share that #Multiverse has been accepted to #NeurIPS 2025! Couldnโ€™t have done it without such incredible collaboratorsโ€”thank you!!
0
0
1
@anneouyang
Anne Ouyang
2 months
Excited to share what friends and I have been working on at @Standard_Kernel We've raised from General Catalyst (@generalcatalyst), Felicis (@felicis), and a group of exceptional angels. We have some great H100 BF16 kernels in pure CUDA+PTX, featuring: - Matmul 102%-105% perf
52
92
993
@DAlistarh
Dan Alistarh
2 months
๐Ÿš€ Excited to announce QuTLASS v0.1.0 ๐ŸŽ‰ QuTLASS is a high-performance library for low-precision deep learning kernels, following NVIDIA CUTLASS. The new release brings 4-bit NVFP4 microscaling and fast transforms to NVIDIA Blackwell GPUs (including the B200!) [1/N]
3
35
220
@a1zhang
Alex L Zhang
2 months
All the recordings for the @GPU_MODE x @scaleml series are up as a playlist in case you missed it ๐Ÿ˜ There's so much value in these ~8 hours of lectures, from proving quantization error bounds on a whiteboard to a deep-dive into GPU warp schedulers! Plz take advantage of it!
7
105
649
@a1zhang
Alex L Zhang
3 months
announcing the @GPU_MODE x @scaleml summer speaker series happening next week, a 5โƒฃ-day series where top researchers will teach about the algorithmic and systems-level advances that underpin `gpt-oss`! all content will be live-streamed & recorded for FREE on GPU MODE's YouTube!
1
45
270