Tianzhe Chu @TianzheC X Profile

Tianzhe Chu

@TianzheC

Followers

297

Following

267

Media

19

Statuses

87

Now @hkudatascience. Previous @ShanghaiTechUni, visited @UCBerkeley.

https://t.co/tW8jSuYD9J

Berkeley, CA

Joined September 2022

Don't wanna be here? Send us removal request.

Tianzhe Chu

@TianzheC

8 months

[1/n] 🧐@deepseek_ai #DeepSeekR1 has shown the power of RL without SFT. But what does RL learns differently than SFT? Our answer is: 📉SFT Memorizes, RL Generalizes.📈 https://t.co/VqpMbv960K

6

43

178

Conference on Parsimony and Learning (CPAL)

@CPALconf

16 days

Calling all parsimony and learning researchers 🚨🚨 The 3rd annual CPAL will be held in Tübingen Germany March 23–26, 2026! Check out this year's website for all the details https://t.co/Ra08OCHmA9

0

7

16

Jasper

@zjasper666

27 days

GAUSS: General Assessment of Underlying Structured Skills in Mathematics We’re excited to launch GAUSS, a next-generation math AI benchmark built to overcome the limitations of low skill resolution in today’s benchmarks. What it does GAUSS profiles LLMs across 12 cognitive

7

29

157

Jyo Pari

@jyo_pari

1 month

For agents to improve over time, they can’t afford to forget what they’ve already mastered. We found that supervised fine-tuning forgets more than RL when training on a new task! Want to find out why? 👇

18

144

890

Tianzhe Chu

@TianzheC

1 month

[3/3] Thanks to @ZitongYang0 for advice on training and @zjasper666 for help on deployment.

0

2

Tianzhe Chu

@TianzheC

1 month

[2/3] Base model: Qwen2.5-7B-Instruct Data: the book parsed with EntiGraph + replay data Objective: pre-training loss Cost: ~10k HKD on the data Link:

1

3

Tianzhe Chu

@TianzheC

1 month

[1/3]🤨Many grad school courses teach you to train a LM but what if we train one to teach? Here is our practice! We build & deploy a cute 7B LM that can answer questions regarding the book. As it's only 7B, it will not relieve students from mathy homework.🤡

Yi Ma

@YiMaTweets

2 months

Our latest book on the mathematical principles of deep learning and intelligence has been released publicly at: https://t.co/ihPBCkI3x5 It also comes with a customized Chatbot that helps readers study and a Chinese version translated mainly by AI. This is an open-source project.

3

1

16

Kai He

@Kai__He

4 months

🚀 Introducing UniRelight, a general-purpose relighting framework powered by video diffusion models. 🌟UniRelight jointly models the distribution of scene intrinsics and illumination, enabling high-quality relighting and intrinsic decomposition from a single image or video.

9

44

163

Two Minute Papers

@twominutepapers

4 months

NVIDIA’s AI watched 150,000 videos… and learned to relight scenes incredibly well! No game engine. No 3D software. And it has an amazing cat demo. 🐱💡 Hold on to your papers! Full video: https://t.co/zjRnIqImjb

1

14

32

Chun-Hsiao (Daniel) Yeh

@danielyehhh

5 months

❗️❗️ Can MLLMs understand scenes from multiple camera viewpoints — like humans? 🧭 We introduce All-Angles Bench — 2,100+ QA pairs on multi-view scenes. 📊 We evaluate 27 top MLLMs, including Gemini-2.0-Flash, Claude-3.7-Sonnet, and GPT-4o. 🌐 Project: https://t.co/yT9aHD3fwm

2

27

79

Tianzhe Chu

@TianzheC

5 months

#ICML2025 +1 🤔Seems nobody really cares paper admission except my mom—asked about the status every several weeks

Tianzhe Chu

@TianzheC

8 months

[1/n] 🧐@deepseek_ai #DeepSeekR1 has shown the power of RL without SFT. But what does RL learns differently than SFT? Our answer is: 📉SFT Memorizes, RL Generalizes.📈 https://t.co/VqpMbv960K

4

2

77

Druv Pai

@druv_pai

6 months

I'm at ICLR this week! I'll be presenting ToST, a (provably) computationally efficient high-performance deep architecture derived from information theory and convex analysis principles. 📅 Saturday April 26, 10AM-12:30PM 📌 Hall 3 + Hall 2B #145 💡Awarded a Spotlight! (1/3)

1

16

40

Tianzhe Chu

@TianzheC

6 months

Will be at ICLR 2025! No paper No plan With camera V me $5 you can get an edited portrait plus an ig follower. Tariffed 245% if you pay by Zella

1

0

28

Peter Tong

@TongPetersb

6 months

We're open-sourcing the training code for MetaMorph! MetaMorph offers a lightweight framework for turning LLMs into unified multimodal models: (multimodal) tokens -> transformers -> diffusion -> pixel! This is our best take on unified modeling as of November 2024, and

Zhuang Liu

@liuzhuang1234

10 months

How far is an LLM from not only understanding but also generating visually? Not very far! Introducing MetaMorph---a multimodal understanding and generation model. In MetaMorph, understanding and generation benefit each other. Very moderate generation data is needed to elicit

4

38

197

Tianzhe Chu

@TianzheC

6 months

Won’t use a model that rejected me twice anymore! @AIatMeta Let’s go Qwen

0

11

Peter Tong

@TongPetersb

6 months

Vision models have been smaller than language models; what if we scale them up? Introducing Web-SSL: A family of billion-scale SSL vision models (up to 7B parameters) trained on billions of images without language supervision, using VQA to evaluate the learned representation.

8

87

480

David Fan

@DavidJFan

6 months

Can visual SSL match CLIP on VQA? Yes! We show with controlled experiments that visual SSL can be competitive even on OCR/Chart VQA, as demonstrated by our new Web-SSL model family (1B-7B params) which is trained purely on web images – without any language supervision.

12

97

459

Peyman Milanfar

@docmilanfar

6 months

Statistical "degrees of freedom" (df) is in general not the same as "the # of parameters." The df for any 1-1 (‘image-to-image’) model ŷ =𝐟(y) : ℝⁿ → ℝⁿ is the trace of its Jacobian: df = div[ 𝐟(y) ] = Trace[ ∇ 𝐟(y) ] 1/n

9

56

652

Anthropic

@AnthropicAI

7 months

New Anthropic research: Tracing the thoughts of a large language model. We built a "microscope" to inspect what happens inside AI models and use it to understand Claude’s (often complex and surprising) internal mechanisms.

180

1K

9K

Kyunghyun Cho

@kchonyc

7 months

it feels like @ylecun is going through his decades-old ideas and re-introducing them one at a time 😂 was the optimal alpha = 2/3 and the optimal gamma = 1.7159? 🤣🤣🤣

Zhuang Liu

@liuzhuang1234

7 months

New paper - Transformers, but without normalization layers (1/n)

12

43

586

Lewis Tunstall @ COLM 🍁

@_lewtun

7 months

Definitive proof that Google Search is unbiased

12

431