TianzheC Profile Banner
Tianzhe Chu Profile
Tianzhe Chu

@TianzheC

Followers
297
Following
267
Media
19
Statuses
87

Now @hkudatascience. Previous @ShanghaiTechUni, visited @UCBerkeley.

Berkeley, CA
Joined September 2022
Don't wanna be here? Send us removal request.
@TianzheC
Tianzhe Chu
8 months
[1/n] 🧐@deepseek_ai #DeepSeekR1 has shown the power of RL without SFT. But what does RL learns differently than SFT? Our answer is: 📉SFT Memorizes, RL Generalizes.📈 https://t.co/VqpMbv960K
6
43
178
@CPALconf
Conference on Parsimony and Learning (CPAL)
16 days
Calling all parsimony and learning researchers 🚨🚨 The 3rd annual CPAL will be held in Tübingen Germany March 23–26, 2026! Check out this year's website for all the details https://t.co/Ra08OCHmA9
0
7
16
@zjasper666
Jasper
27 days
GAUSS: General Assessment of Underlying Structured Skills in Mathematics We’re excited to launch GAUSS, a next-generation math AI benchmark built to overcome the limitations of low skill resolution in today’s benchmarks. What it does GAUSS profiles LLMs across 12 cognitive
7
29
157
@jyo_pari
Jyo Pari
1 month
For agents to improve over time, they can’t afford to forget what they’ve already mastered. We found that supervised fine-tuning forgets more than RL when training on a new task! Want to find out why? 👇
18
144
890
@TianzheC
Tianzhe Chu
1 month
[3/3] Thanks to @ZitongYang0 for advice on training and @zjasper666 for help on deployment.
0
0
2
@TianzheC
Tianzhe Chu
1 month
[2/3] Base model: Qwen2.5-7B-Instruct Data: the book parsed with EntiGraph + replay data Objective: pre-training loss Cost: ~10k HKD on the data Link:
1
1
3
@TianzheC
Tianzhe Chu
1 month
[1/3]🤨Many grad school courses teach you to train a LM but what if we train one to teach? Here is our practice! We build & deploy a cute 7B LM that can answer questions regarding the book. As it's only 7B, it will not relieve students from mathy homework.🤡
@YiMaTweets
Yi Ma
2 months
Our latest book on the mathematical principles of deep learning and intelligence has been released publicly at: https://t.co/ihPBCkI3x5 It also comes with a customized Chatbot that helps readers study and a Chinese version translated mainly by AI. This is an open-source project.
3
1
16
@Kai__He
Kai He
4 months
🚀 Introducing UniRelight, a general-purpose relighting framework powered by video diffusion models. 🌟UniRelight jointly models the distribution of scene intrinsics and illumination, enabling high-quality relighting and intrinsic decomposition from a single image or video.
9
44
163
@twominutepapers
Two Minute Papers
4 months
NVIDIA’s AI watched 150,000 videos… and learned to relight scenes incredibly well! No game engine. No 3D software. And it has an amazing cat demo. 🐱💡 Hold on to your papers! Full video: https://t.co/zjRnIqImjb
1
14
32
@danielyehhh
Chun-Hsiao (Daniel) Yeh
5 months
❗️❗️ Can MLLMs understand scenes from multiple camera viewpoints — like humans? 🧭 We introduce All-Angles Bench — 2,100+ QA pairs on multi-view scenes. 📊 We evaluate 27 top MLLMs, including Gemini-2.0-Flash, Claude-3.7-Sonnet, and GPT-4o. 🌐 Project: https://t.co/yT9aHD3fwm
2
27
79
@TianzheC
Tianzhe Chu
5 months
#ICML2025 +1 🤔Seems nobody really cares paper admission except my mom—asked about the status every several weeks
@TianzheC
Tianzhe Chu
8 months
[1/n] 🧐@deepseek_ai #DeepSeekR1 has shown the power of RL without SFT. But what does RL learns differently than SFT? Our answer is: 📉SFT Memorizes, RL Generalizes.📈 https://t.co/VqpMbv960K
4
2
77
@druv_pai
Druv Pai
6 months
I'm at ICLR this week! I'll be presenting ToST, a (provably) computationally efficient high-performance deep architecture derived from information theory and convex analysis principles. 📅 Saturday April 26, 10AM-12:30PM 📌 Hall 3 + Hall 2B #145 💡Awarded a Spotlight! (1/3)
1
16
40
@TianzheC
Tianzhe Chu
6 months
Will be at ICLR 2025! No paper No plan With camera V me $5 you can get an edited portrait plus an ig follower. Tariffed 245% if you pay by Zella
1
0
28
@TongPetersb
Peter Tong
6 months
We're open-sourcing the training code for MetaMorph! MetaMorph offers a lightweight framework for turning LLMs into unified multimodal models: (multimodal) tokens -> transformers -> diffusion -> pixel! This is our best take on unified modeling as of November 2024, and
@liuzhuang1234
Zhuang Liu
10 months
How far is an LLM from not only understanding but also generating visually? Not very far! Introducing MetaMorph---a multimodal understanding and generation model. In MetaMorph, understanding and generation benefit each other. Very moderate generation data is needed to elicit
4
38
197
@TianzheC
Tianzhe Chu
6 months
Won’t use a model that rejected me twice anymore! @AIatMeta Let’s go Qwen
0
0
11
@TongPetersb
Peter Tong
6 months
Vision models have been smaller than language models; what if we scale them up? Introducing Web-SSL: A family of billion-scale SSL vision models (up to 7B parameters) trained on billions of images without language supervision, using VQA to evaluate the learned representation.
8
87
480
@DavidJFan
David Fan
6 months
Can visual SSL match CLIP on VQA? Yes! We show with controlled experiments that visual SSL can be competitive even on OCR/Chart VQA, as demonstrated by our new Web-SSL model family (1B-7B params) which is trained purely on web images – without any language supervision.
12
97
459
@docmilanfar
Peyman Milanfar
6 months
Statistical "degrees of freedom" (df) is in general not the same as "the # of parameters." The df for any 1-1 (‘image-to-image’) model ŷ =𝐟(y) : ℝⁿ → ℝⁿ is the trace of its Jacobian: df = div[ 𝐟(y) ] = Trace[ ∇ 𝐟(y) ] 1/n
9
56
652
@AnthropicAI
Anthropic
7 months
New Anthropic research: Tracing the thoughts of a large language model. We built a "microscope" to inspect what happens inside AI models and use it to understand Claude’s (often complex and surprising) internal mechanisms.
180
1K
9K
@kchonyc
Kyunghyun Cho
7 months
it feels like @ylecun is going through his decades-old ideas and re-introducing them one at a time 😂 was the optimal alpha = 2/3 and the optimal gamma = 1.7159? 🤣🤣🤣
@liuzhuang1234
Zhuang Liu
7 months
New paper - Transformers, but without normalization layers (1/n)
12
43
586
@_lewtun
Lewis Tunstall @ COLM 🍁
7 months
Definitive proof that Google Search is unbiased
12
12
431