EranMalach Profile Banner
Eran Malach Profile
Eran Malach

@EranMalach

Followers
626
Following
106
Media
21
Statuses
95

Apple

Joined December 2019
Don't wanna be here? Send us removal request.
@EranMalach
Eran Malach
2 months
RT @orvieto_antonio: We have a new SSM theory paper, just accepted to COLT, revisiting recall properties of linear RNNs. It's surprising….
0
40
0
@EranMalach
Eran Malach
2 months
RT @MOSS_workshop: We are extending the deadline to May 26th 4:59pm PDT (11:59pm UTC). Thank you everyone for your interest & inquiries; we….
0
8
0
@EranMalach
Eran Malach
3 months
RT @SurbhiGoel_: Super excited to announce our ICML workshop on highlighting the power (and limitations?) of small-scale in the era of larg….
0
17
0
@EranMalach
Eran Malach
3 months
RT @MOSS_workshop: Announcing the 1st Workshop on Methods and Opportunities at Small Scale (MOSS) at @icmlconf 2025!. 🔗Website: https://t.c….
0
13
0
@EranMalach
Eran Malach
3 months
RT @BingbinL: Excited to announce MOSS, our ICML workshop focused on discoveries at small scale! We believe there's tremendous potential &….
0
15
0
@EranMalach
Eran Malach
4 months
RT @natolambert: The best part of RLs focus in post-training right now is that the elicitation idea of post-training is a much better match….
0
32
0
@EranMalach
Eran Malach
4 months
Finally, we also observe transfer between math dataset after RL fine-tuning. Specifically, running RL on the GSM8K (grade school math) training set significantly improves performance on the much harder MATH-500 evaluation.
Tweet media one
1
1
3
@EranMalach
Eran Malach
4 months
Additionally, we demonstrate that the choice of the optimal strategy depends on the model size. Small models struggle with arithmetics, so prefer to use code (which is executed externally). Larger models, however, converge to a strategy which uses natural language text.
Tweet media one
1
1
4
@EranMalach
Eran Malach
4 months
While RL typically chooses the “best” (highest accuracy) strategy, we also discover failure cases, where the initial distribution biases RL towards the “wrong” strategy, causing accuracy to eventually collapse.
Tweet media one
1
1
3
@EranMalach
Eran Malach
4 months
Our setting: we pretrain small (<=1B) models on different mixtures of math datasets which use different solution strategies (code or text), then run RL. We observe that while the “base” model generates a mixture of strategies, RL converges to a single dominant strategy.
Tweet media one
1
2
5
@EranMalach
Eran Malach
4 months
How does RL improve performance on math reasoning? Studying RL from pretrained models is hard, as behavior depends on choice of base model. 🚨 In our new work, we train models *from scratch* to study the effect of the data mix on the behavior of RL.
Tweet media one
3
35
138
@EranMalach
Eran Malach
4 months
To backtrack or not to backtrack?.The answer depends on the nature of the reasoning problem!. Check out our paper new paper, led by @sunnytqin, with @elmelis and @SamyJelassi:. See thread below 👇👇.
Tweet card summary image
arxiv.org
Recent advancements in large language models have significantly improved their reasoning abilities, particularly through techniques involving search and backtracking. Backtracking naturally scales...
@elmelis
David Alvarez Melis
4 months
🚨 New preprint! TL;DR: Backtracking is not the "holy grail" for smarter LLMs. It’s praised for helping models “fix mistakes” and improve reasoning—but is it really the best use of test-time compute? 🤔.
0
0
5
@EranMalach
Eran Malach
8 months
RT @KempnerInst: The 4:30pm poster session today at #NeurIPS2024 will feature "The Evolution of Statistical Induction Heads: In-Context Lea….
0
4
0
@EranMalach
Eran Malach
8 months
Presenting this work at #NeurIPS2024 today 4:30pm session (poster #4807, east). Come by to hear about auto-regressive decision trees for language modeling!.
@yule_gan
Yulu Gan
9 months
New paper at #NeurIPS2024!. In which we try to make a *small yet interpretable* model work. We use decision trees, which offer a fully transparent decision-making process, in an autoregressive manner to do language tasks. paper: (1/n)
Tweet media one
0
0
7
@EranMalach
Eran Malach
8 months
Will be presenting this work at #NeurIPS2024, today 11am, poster #2311. Come visit us!.
@nsaphra
Naomi Saphra
1 year
Modern generative models are trained to imitate human experts, but can they actually beat those experts? Our new paper uses imitative chess agents to explore when a model can "transcend" its training distribution and outperform every human it's trained on.
Tweet media one
1
5
10