Matthew Yang Profile
Matthew Yang

@matthewyryang

Followers
47
Following
51
Media
7
Statuses
27

MSML student @ CMU

Joined August 2024
Don't wanna be here? Send us removal request.
@_AndrewZhao
Andrew Zhao
2 months
paper of the day
13
32
532
@thegautamkamath
Gautam Kamath
2 months
Anyone who's done a PhD knows the feeling
4
5
128
@real_ZheyuanHu
Zheyuan Hu
2 months
Introducing RaC: A data collection protocol that boosts data efficiency by 10x compared to some of the best imitation results. Key idea: scale recovery & correction data systematically => policies can reset+retry when acting (consistent self-correct) => better performance. 🧵0/N
11
38
210
@aviral_kumar2
Aviral Kumar
2 months
🚨🚨New paper on core RL: a way to train value-functions via flow-matching for scaling compute! No text/images, but a flow directly on a scalar Q-value. This unlocks benefits of iterative compute, test-time scaling for value prediction & SOTA results on whatever we tried. 🧵⬇️
11
83
708
@setlur_amrith
Amrith Setlur
2 months
Nice to see ideas in our e3 paper ( https://t.co/tUAKAqDO05): chaining asymmetries to learn meta-behaviors, also work on didactic tasks!
@lifan__yuan
Lifan Yuan
2 months
🧩New blog: From f(x) and g(x) to f(g(x)): LLMs Learn New Skills in RL by Composing Old Ones Do LLMs learn new skills through RL, or just activate existing patterns? Answer: RL teaches the powerful meta-skill of composition when properly incentivized. 🔗: https://t.co/4Ud8qsYrOT
0
3
23
@Alibaba_Wan
Wan
4 months
🚀 Introducing Wan2.2: The World's First Open-Source MoE-Architecture Video Generation Model with Cinematic Control! 🔥 Key Innovations: ꔷ World's First Open-Source MoE Video Model: Our Mixture-of-Experts architecture scales model capacity without increasing computational
84
311
2K
@alexandr_wang
Alexandr Wang
5 months
I’m excited to be the Chief AI Officer of @Meta, working alongside @natfriedman, and thrilled to be accompanied by an incredible group of people joining on the same day. Towards superintelligence 🚀
1K
2K
23K
@setlur_amrith
Amrith Setlur
5 months
Since R1 there has been a lot of chatter 💬 on post-training LLMs with RL. Is RL only sharpening the distribution over correct responses sampled by the pretrained LLM OR is it exploring and discovering new strategies 🤔? Find answers in our latest post ⬇️ https://t.co/WCEq3K4dB0
Tweet card summary image
pinnate-flare-8f3.notion.site
Amrith Setlur and Aviral Kumar, Carnegie Mellon University
2
30
154
@aviral_kumar2
Aviral Kumar
5 months
Our view on test-time scaling has been to train models to discover algos that enable them to solve harder problems. @setlur_amrith & @matthewyryang's new work e3 shows how RL done with this view produces best <2B LLM on math that extrapolates beyond training budget. 🧵⬇️
2
32
182
@setlur_amrith
Amrith Setlur
5 months
Introducing e3 🔥 Best <2B model on math 💪 Are LLMs implementing algos ⚒️ OR is thinking an illusion 🎩.? Is RL only sharpening the base LLM distrib. 🤔 OR discovering novel strategies outside base LLM 💡? We answer these ⤵️ 🚨 https://t.co/xbLULWYTmM 🚨 https://t.co/xuruZtQ6BA
1
24
96
@matthewyryang
Matthew Yang
5 months
Without matching prompts to budget: - Too little budget for hard prompts → kills exploration early - Too much budget for easy prompts → over-exploratory behavior Blue: fixed data mixture Green: fixed training budget Black: coupled curriculum (e3) 🧵[7/8]
1
0
1
@matthewyryang
Matthew Yang
5 months
Ingredient #3: Coupled Curriculum To fully unlock in-context exploration, RL must operate in the right mode - not (i) sharpening known responses but in (ii) chaining new ones. This requires coupling the right prompts with the right budget during training. 🧵[6/8]
1
0
0
@matthewyryang
Matthew Yang
5 months
Ingredient #2: Negative Gradient Chaining leads to in-context exploration, but how can we incentivize it? Here comes the "negative gradient" in RL, which reduces the probability of EOS in favor of continuation and trying new stuff. 🧵[5/8]
1
0
0
@matthewyryang
Matthew Yang
5 months
Ingredient #1: Asymmetries Asymmetries = differences in competence in base model capabilities (e.g., verification ≠ generation) ✅ Models w. asymmetries learn to explore by chaining them - leading to longer responses ❌ Models w/o. asymmetries do not 🧵[4/8]
1
0
0
@matthewyryang
Matthew Yang
5 months
Our result? A SOTA < 2B model on AIME and HMMT’25 that extrapolates to 2x the training budget! We teach models to scale their reasoning with test-time compute 📈 using three key ingredients: 🧵[3/8]
1
0
0
@matthewyryang
Matthew Yang
5 months
The ultimate promise of test-time scaling is extrapolation: the ability of LLMs to improve as they reason for longer than they were trained. Most open-source models flat-line when test-time compute increases - more tokens, same performance 💔 they just can’t extrapolate 😔
1
0
0
@matthewyryang
Matthew Yang
5 months
🚨 NEW PAPER: What if LLMs could tackle harder problems - not by explicitly training on longer traces, but by learning how to think longer? Our recipe e3 teaches models to explore in-context, enabling LLMs to unlock longer reasoning chains without ever seeing them in training.
1
5
10
@QGallouedec
Quentin Gallouédec
7 months
🤔 How do you explain that when we apply RL to math problems, the incorrect answers become longer than the correct ones? We had this discussion this morning, and I'm curious to know what the community thinks about it.
38
20
185
@maxescu
Alex Patrascu
8 months
Be water, my friend
77
270
2K