
Tianzhe Chu
@TianzheC
Followers
297
Following
267
Media
19
Statuses
87
Now @hkudatascience. Previous @ShanghaiTechUni, visited @UCBerkeley.
Berkeley, CA
Joined September 2022
[1/n] 🧐@deepseek_ai #DeepSeekR1 has shown the power of RL without SFT. But what does RL learns differently than SFT? Our answer is: 📉SFT Memorizes, RL Generalizes.📈 https://t.co/VqpMbv960K
6
43
178
Calling all parsimony and learning researchers 🚨🚨 The 3rd annual CPAL will be held in Tübingen Germany March 23–26, 2026! Check out this year's website for all the details https://t.co/Ra08OCHmA9
0
7
16
GAUSS: General Assessment of Underlying Structured Skills in Mathematics We’re excited to launch GAUSS, a next-generation math AI benchmark built to overcome the limitations of low skill resolution in today’s benchmarks. What it does GAUSS profiles LLMs across 12 cognitive
7
29
157
For agents to improve over time, they can’t afford to forget what they’ve already mastered. We found that supervised fine-tuning forgets more than RL when training on a new task! Want to find out why? 👇
18
144
890
[3/3] Thanks to @ZitongYang0 for advice on training and @zjasper666 for help on deployment.
0
0
2
[2/3] Base model: Qwen2.5-7B-Instruct Data: the book parsed with EntiGraph + replay data Objective: pre-training loss Cost: ~10k HKD on the data Link:
1
1
3
[1/3]🤨Many grad school courses teach you to train a LM but what if we train one to teach? Here is our practice! We build & deploy a cute 7B LM that can answer questions regarding the book. As it's only 7B, it will not relieve students from mathy homework.🤡
Our latest book on the mathematical principles of deep learning and intelligence has been released publicly at: https://t.co/ihPBCkI3x5 It also comes with a customized Chatbot that helps readers study and a Chinese version translated mainly by AI. This is an open-source project.
3
1
16
🚀 Introducing UniRelight, a general-purpose relighting framework powered by video diffusion models. 🌟UniRelight jointly models the distribution of scene intrinsics and illumination, enabling high-quality relighting and intrinsic decomposition from a single image or video.
9
44
163
NVIDIA’s AI watched 150,000 videos… and learned to relight scenes incredibly well! No game engine. No 3D software. And it has an amazing cat demo. 🐱💡 Hold on to your papers! Full video: https://t.co/zjRnIqImjb
1
14
32
❗️❗️ Can MLLMs understand scenes from multiple camera viewpoints — like humans? 🧭 We introduce All-Angles Bench — 2,100+ QA pairs on multi-view scenes. 📊 We evaluate 27 top MLLMs, including Gemini-2.0-Flash, Claude-3.7-Sonnet, and GPT-4o. 🌐 Project: https://t.co/yT9aHD3fwm
2
27
79
#ICML2025 +1 🤔Seems nobody really cares paper admission except my mom—asked about the status every several weeks
[1/n] 🧐@deepseek_ai #DeepSeekR1 has shown the power of RL without SFT. But what does RL learns differently than SFT? Our answer is: 📉SFT Memorizes, RL Generalizes.📈 https://t.co/VqpMbv960K
4
2
77
I'm at ICLR this week! I'll be presenting ToST, a (provably) computationally efficient high-performance deep architecture derived from information theory and convex analysis principles. 📅 Saturday April 26, 10AM-12:30PM 📌 Hall 3 + Hall 2B #145 💡Awarded a Spotlight! (1/3)
1
16
40
Will be at ICLR 2025! No paper No plan With camera V me $5 you can get an edited portrait plus an ig follower. Tariffed 245% if you pay by Zella
1
0
28
We're open-sourcing the training code for MetaMorph! MetaMorph offers a lightweight framework for turning LLMs into unified multimodal models: (multimodal) tokens -> transformers -> diffusion -> pixel! This is our best take on unified modeling as of November 2024, and
How far is an LLM from not only understanding but also generating visually? Not very far! Introducing MetaMorph---a multimodal understanding and generation model. In MetaMorph, understanding and generation benefit each other. Very moderate generation data is needed to elicit
4
38
197
Vision models have been smaller than language models; what if we scale them up? Introducing Web-SSL: A family of billion-scale SSL vision models (up to 7B parameters) trained on billions of images without language supervision, using VQA to evaluate the learned representation.
8
87
480
Can visual SSL match CLIP on VQA? Yes! We show with controlled experiments that visual SSL can be competitive even on OCR/Chart VQA, as demonstrated by our new Web-SSL model family (1B-7B params) which is trained purely on web images – without any language supervision.
12
97
459
Statistical "degrees of freedom" (df) is in general not the same as "the # of parameters." The df for any 1-1 (‘image-to-image’) model ŷ =𝐟(y) : ℝⁿ → ℝⁿ is the trace of its Jacobian: df = div[ 𝐟(y) ] = Trace[ ∇ 𝐟(y) ] 1/n
9
56
652
New Anthropic research: Tracing the thoughts of a large language model. We built a "microscope" to inspect what happens inside AI models and use it to understand Claude’s (often complex and surprising) internal mechanisms.
180
1K
9K
it feels like @ylecun is going through his decades-old ideas and re-introducing them one at a time 😂 was the optimal alpha = 2/3 and the optimal gamma = 1.7159? 🤣🤣🤣
12
43
586