
Tianzhe Chu
@TianzheC
Followers
277
Following
255
Media
18
Statuses
80
Now @hkudatascience. Previous @ShanghaiTechUni, visited @UCBerkeley.
Berkeley, CA
Joined September 2022
[1/n] 🧐@deepseek_ai #DeepSeekR1 has shown the power of RL without SFT. But what does RL learns differently than SFT?. Our answer is: .📉SFT Memorizes, RL Generalizes.📈.
tianzhechu.com
SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training
6
43
178
RT @twominutepapers: NVIDIA’s AI watched 150,000 videos… and learned to relight scenes incredibly well! No game engine. No 3D software. And….
0
13
0
RT @danielyehhh: ❗️❗️ Can MLLMs understand scenes from multiple camera viewpoints — like humans?. 🧭 We introduce All-Angles Bench — 2,100+….
0
27
0
#ICML2025 +1 🤔Seems nobody really cares paper admission except my mom—asked about the status every several weeks.
[1/n] 🧐@deepseek_ai #DeepSeekR1 has shown the power of RL without SFT. But what does RL learns differently than SFT?. Our answer is: .📉SFT Memorizes, RL Generalizes.📈.
4
2
77
RT @TongPetersb: We're open-sourcing the training code for MetaMorph!. MetaMorph offers a lightweight framework for turning LLMs into unif….
0
39
0
RT @TongPetersb: Vision models have been smaller than language models; what if we scale them up?. Introducing Web-SSL: A family of billion-….
0
86
0
RT @DavidJFan: Can visual SSL match CLIP on VQA?. Yes! We show with controlled experiments that visual SSL can be competitive even on OCR/C….
0
95
0
RT @docmilanfar: Statistical "degrees of freedom" (df) is in general not the same as "the # of parameters.". The df for any 1-1 (‘image-to-….
0
61
0
RT @AnthropicAI: New Anthropic research: Tracing the thoughts of a large language model. We built a "microscope" to inspect what happens i….
0
1K
0
Huge congrats to @simon_zhai and long live your trolls!.
Just attended the dissertation talk by one of my phd students at Berkeley, Simon Zhai. He is joining Deepmind after graduation. Congratulations!
0
0
10
RT @alec_helbling: Create heatmaps that localize text concepts in generated videos. We discovered that our approach, ConceptAttention, ca….
0
66
0
RT @YiMaTweets: This is our latest work SimDINO that, again based on coding rate principle, significantly simplifie….
arxiv.org
DINO and DINOv2 are two model families being widely used to learn representations from unlabeled imagery data at large scales. Their learned representations often enable state-of-the-art...
0
48
0
RT @YiMaTweets: Our new work ToST is now available as an ICLR'25 spotlight: At least one thing DeepSeek taught us i….
robinwu218.github.io
ToST is a transformer architecture with linear-time attention that is both performant and interpretable, derived from principled compression objectives.
0
45
0