Tianyuan Zhang Profile
Tianyuan Zhang

@tianyuanzhang99

Followers
2K
Following
3K
Media
20
Statuses
184

PhDing in@MIT, towards general intelligence and lifelong machine M.S. in CMU, B.S. in PKU.

Boston
Joined September 2017
Don't wanna be here? Send us removal request.
@tianyuanzhang99
Tianyuan Zhang
5 months
Bored of linear recurrent memories (e.g., linear attention) and want a scalable, nonlinear alternative? Our new paper “Test-Time Training Done Right” propose LaCT (Large Chunk Test-Time Training) — a highly efficient, massively scalable nonlinear memory with: 💡 Pure PyTorch
5
88
425
@tianyuanzhang99
Tianyuan Zhang
12 days
Totally agree, pertaining only works when marginal cost of data is nearly zero.
@TairanHe99
Tairan He
13 days
Tesla - collects 4.3M hours of driving data - every day - for free - to train a 2DoF system (steering + throttle). - yet full autonomy remains unsolved. Frontier robotics startups/labs - collect or purchase 0.01M–1M hours of data - every X month - for millions of dollars - to
4
6
226
@bowei_chen_19
Bowei Chen
23 days
The Representation Autoencoders (RAE) by @sainingxie's team is fascinating — a brilliant demonstration that high-dimensional diffusion is indeed feasible. In our latest work on semantic encoders, we align a pretrained foundation encoder (e.g., DINOv2) as a visual tokenizer,
@bowei_chen_19
Bowei Chen
1 month
We found that visual foundation encoder can be aligned to serve as tokenizers for latent diffusion models in image generation! Our new paper introduces a new tokenizer training paradigm that produces a semantically rich latent space, improving diffusion model performance🚀🚀.
2
22
233
@tianyuanzhang99
Tianyuan Zhang
1 month
Congrats! Getting such results with a completely new route!
@TianweiY
Tianwei Yin
1 month
Our new editing model just entered the top-3 on the image editing leaderboards, ahead of GPT-Image, Qwen-Edit, and Flux-Kontext 🚀 We’re taking a very different research path than most—starting with fine-grained regional editing, and aiming toward image generation that feels as
0
0
4
@bowei_chen_19
Bowei Chen
1 month
We found that visual foundation encoder can be aligned to serve as tokenizers for latent diffusion models in image generation! Our new paper introduces a new tokenizer training paradigm that produces a semantically rich latent space, improving diffusion model performance🚀🚀.
7
72
528
@tianyuanzhang99
Tianyuan Zhang
2 months
Oh man! Why I missed this blog in the summer😇 Deriving derivative of the Muon optimizer. Would be very interesting to try in test time training.
@Jianlin_S
jianlin.su
5 months
https://t.co/EGIwz9VeME Discussed the derivative calculation of the msign operator. If you are interested in the combination of “TTT + Muon” like https://t.co/u9qW6lWqBH , this might be helpful to you.
0
1
20
@tianyuanzhang99
Tianyuan Zhang
2 months
Interesting ui with a powerful model. Taking some time to play with it!
@TianweiY
Tianwei Yin
2 months
A first look at our research vision 👀
0
0
6
@SonglinYang4
Songlin Yang
2 months
Excited to see Gated DeltaNet being adopted in the @Alibaba_Qwen series ! It has also previously demonstrated strong effectiveness in @nvidia's Jet-Nemotron
@Alibaba_Qwen
Qwen
2 months
🚀 Introducing Qwen3-Next-80B-A3B — the FUTURE of efficient LLMs is here! 🔹 80B params, but only 3B activated per token → 10x cheaper training, 10x faster inference than Qwen3-32B.(esp. @ 32K+ context!) 🔹Hybrid Architecture: Gated DeltaNet + Gated Attention → best of speed &
9
54
552
@tianyuanzhang99
Tianyuan Zhang
3 months
part science, part empirical, part magic. All driven by extreme curiosity!!
@Guangxuan_Xiao
Guangxuan Xiao
3 months
I've written the full story of Attention Sinks — a technical deep-dive into how the mechanism was developed and how our research ended up being used in OpenAI's new OSS models. For those interested in the details: https://t.co/0EAi2KQMMx
3
1
39
@tianyuanzhang99
Tianyuan Zhang
4 months
Model and training code for LaCT on language model, AR video gen and novel view synthesis are released, also have a TTT layer implementation with sequence parallel supported. Both object-centric and scene-level view synthesis checkpoints are released 🤓— come play!
@tianyuanzhang99
Tianyuan Zhang
5 months
Bored of linear recurrent memories (e.g., linear attention) and want a scalable, nonlinear alternative? Our new paper “Test-Time Training Done Right” propose LaCT (Large Chunk Test-Time Training) — a highly efficient, massively scalable nonlinear memory with: 💡 Pure PyTorch
3
19
116
@ShivamDuggal4
Shivam Duggal
4 months
Compression is the heart of intelligence From Occam to Kolmogorov—shorter programs=smarter representations Meet KARL: Kolmogorov-Approximating Representation Learning. Given an image, token budget T & target quality 𝜖 —KARL finds the smallest t≤T to reconstruct it within 𝜖🧵
14
62
356
@tianyuanzhang99
Tianyuan Zhang
4 months
I feel we need both. compression and sparsity are orthogonal, sometimes even the opposite.
@wenhaocha1
Wenhao Chai
4 months
talke a look at this blog introduce sparse attn and the implementation, which I think currently more promising than compression based method for long-context modeling
2
0
28
@Haoyu_Xiong_
Haoyu Xiong
5 months
Your bimanual manipulators might need a Robot Neck 🤖🦒 Introducing Vision in Action: Learning Active Perception from Human Demonstrations ViA learns task-specific, active perceptual strategies—such as searching, tracking, and focusing—directly from human demos, enabling robust
18
95
427
@Kai__He
Kai He
5 months
🚀 Introducing UniRelight, a general-purpose relighting framework powered by video diffusion models. 🌟UniRelight jointly models the distribution of scene intrinsics and illumination, enabling high-quality relighting and intrinsic decomposition from a single image or video.
9
49
165
@Bw_Li1024
Bowen Li
5 months
"Generalization means being able to solve problems that the system hasn't been prepared for." Our latest work in #RSS2025 can automatically invent neural networks as state abstractions, which help robots generalize. Check it out here: https://t.co/RkoR5MRRJg
5
26
123
@tianyuanzhang99
Tianyuan Zhang
5 months
Just arrived at Nashville for CVPR! Looking forward to chat on any topics!
@ZiqiPang
Ziqi Pang
5 months
Our CVPR'25 Oral - RandAR - presents a missing aspect of AR models for "𝐆𝐏𝐓 𝐌𝐨𝐦𝐞𝐧𝐭 𝐢𝐧 𝐕𝐢𝐬𝐢𝐨𝐧": beyond image gen, 𝐳𝐞𝐫𝐨-𝐬𝐡𝐨𝐭 𝐠𝐞𝐧𝐞𝐫𝐚𝐥𝐢𝐳𝐚𝐭𝐢𝐨𝐧 𝐭𝐨 𝐝𝐢𝐯𝐞𝐫𝐬𝐞 𝐝𝐨𝐦𝐚𝐢𝐧𝐬 is the REAL objective, and 𝐫𝐚𝐧𝐝𝐨𝐦 𝐨𝐫𝐝𝐞𝐫 can achieve it.
0
0
13
@tianyuanzhang99
Tianyuan Zhang
5 months
Thanks Songlin and Xinyu for hosting. Here is the recording and slides.
@SonglinYang4
Songlin Yang
5 months
1
3
33
@tianyuanzhang99
Tianyuan Zhang
5 months
Happening in 5 min
@SonglinYang4
Songlin Yang
5 months
Test-time training (TTT) is an elegant framework for adapting context to model weights. In today’s ASAP seminar (2pm Eastern Time), @tianyuanzhang99 presents Large Chunk TTT (LaCT) — a simple, efficient method combining TTT with chunked attention to unlock new opportunities.
0
1
18
@xxunhuang
Xun Huang
5 months
Real-time video generation is finally real — without sacrificing quality. Introducing Self-Forcing, a new paradigm for training autoregressive diffusion models. The key to high quality? Simulate the inference process during training by unrolling transformers with KV caching.
29
142
871