Explore tweets tagged as #OverOptimization
@harshit_sikchi
Harshit Sikchi (at RLC 25)
1 year
Direct alignment algorithms (DAAs) are fast and have easy-to-tune hyperparameters, but they still suffer from a form of reward overoptimization*. We study this in detail 👇
Tweet media one
1
4
24
@MarioJooss
Mario Joos
7 months
Here's the truth about overoptimization on YouTube.
Tweet media one
9
14
158
@rm_rafailov
Rafael Rafailov @ NeurIPS
1 year
After the LLaMa 3.1 release and ICML, I wan to highlight our paper "Scaling Laws for Reward Model Overoptimization in Direct Alignment Algorithms". TL;DR we explore the dynamics of over-optimization in DPO/IPO/SLiC and find similiar "reward hacking" issues as online RLHF.👇
Tweet media one
2
47
251
@otrasenda_AC
Oliver Lopez-Corona (Tecozcacuauhtli)
1 year
Overoptimization. Is there a hard limit?
Tweet media one
0
0
3
@arnavbathla20
Arnav Bathla
10 months
Overoptimization is just optimization. Micromanaging is just managing. Overreacting is just reacting.
Tweet media one
1
0
7
@iScienceLuvr
Tanishq Mathew Abraham, Ph.D.
2 years
Iterative Data Smoothing: Mitigating Reward Overfitting and Overoptimization in RLHF. abs: Proposes that the root cause of reward overfitting and overoptimization in RLHF is the inadequacyof the cross-entropy loss for long-tailed preference datasets. This
Tweet media one
0
26
170
@nntaleb
Nassim Nicholas Taleb
7 months
The antifragility of system comes from the mortality of its components; immortality blocks evolution. Work for the immortality of the collective. [On top of my disgust for non-stoical neurotic overoptimization].h/t @Gregoresate
Tweet media one
@nntaleb
Nassim Nicholas Taleb
7 months
@bryan_johnson Looks like you didn't understand much from Skin in the Game. It states that we are not supposed to be immortal; only our genes. This is aside from, in my general work, the contempt, perhaps even disgust I have for your brand of non-stoical neurotic overoptimization.
112
211
2K
@papers_anon
PapersAnon
1 year
Correcting the Mythos of KL-Regularization: Direct Alignment without Overparameterization via Chi-squared Preference Optimization. One-line change to DPO to implement the principle of pessimism to alleviate overoptimization. No models tested. Potential paper there. Links below
Tweet media one
1
1
11
@fly51fly
fly51fly
1 year
[LG] Scaling Laws for Reward Model Overoptimization in Direct Alignment Algorithms.R Rafailov, Y Chittepu, R Park, H Sikchi. [Stanford University & UMass Amherst] (2024). - Direct Alignment Algorithms (DAAs) like Direct Preference Optimization have
Tweet media one
Tweet media two
Tweet media three
Tweet media four
0
10
34
@iScienceLuvr
Tanishq Mathew Abraham, Ph.D.
2 years
Confronting Reward Model Overoptimization with Constrained RLHF. abs: Studies reward overoptimization for composite reward models and evaluating various constrained RLHF approaches to maximize reward scores till they reach "proxy points"
Tweet media one
2
11
73
@iScienceLuvr
Tanishq Mathew Abraham, Ph.D.
2 years
Reward Model Ensembles Help Mitigate Overoptimization. abs: RLHF can struggle with overoptimization, where the policy gets better according to the learned reward model but its true reward is actually worse. Building off Gao et al. 2023, here it is
Tweet media one
Tweet media two
0
13
71
@_akhaliq
AK
1 year
Scaling Laws for Reward Model Overoptimization in Direct Alignment Algorithms. Reinforcement Learning from Human Feedback (RLHF) has been crucial to the recent success of Large Language Models (LLMs), however, it is often a complex and brittle process.
Tweet media one
1
24
120
@fly51fly
fly51fly
2 years
[LG] Iterative Data Smoothing: Mitigating Reward Overfitting and Overoptimization in RLHF.B Zhu, M I. Jordan, J Jiao [UC Berkeley] (2024). - The paper investigates issues of reward overfitting and overoptimization in reinforcement learning from human
Tweet media one
Tweet media two
Tweet media three
Tweet media four
0
5
13
@Abbasshaikh42
Abbas
7 months
“Looks like you didn't understand much from Skin in the Game. It states that we are not supposed to be immortal; only our genes. This is aside from, in my general work, the contempt, perhaps even disgust I have for your brand of non-stoical neurotic overoptimization.”
Tweet media one
0
0
5
@farairesearch
FAR.AI
8 months
Come chat with us about our AI Safety papers at #NeurIPS2024!.12/11: 💥Catastrophic Goodhart: overoptimization in RLHF.12/12: ⚙️ Analysing the Generalisation and Reliability of Steering Vectors.12/12: 🌀 Hypothesis Testing the Circuit Hypothesis in LLMs.12/13: 🔬 InterpBench.🧵👇
1
1
5
@kalomaze
kalomaze
3 months
the overoptimization issues with RLHF/classifier-based RL in my case (for GRPO) seem completely mitigated by hard capping the reward after >50% probability preferred.(i also multiply the values * 2, so 1.0 reward = "at least 50% or greater preference", 0.5 = "25% preference")
Tweet media one
3
1
66
@mreliwjones
Mr. Eli W. Jones
7 months
there's a sort of instinctive revulsion some of us have to "neurotic overoptimization". it comes from past experiences of being swept up in groups bent on particular forms of overoptimization, and it having disastrous consequences to the group
Tweet media one
1
0
0
@thefrederikbaun
Frederik Baun
1 year
Here’s one thing Andrew Tate and Naval Ravikant have in common – and both get wrong. (And yes, it’s connected to this freshly baked bread.). They’re obsessed with “overoptimization.”. Bropreneurs and internet gurus alike preach we should cut out mundane tasks, such as cooking
Tweet media one
1
0
1
@harshit_sikchi
Harshit Sikchi (at RLC 25)
11 months
Our cross-university(s) collaborative work on "Scaling laws for Reward Model Overoptimization in Direct Alignment Algorithms" is accepted at @NeurIPSConf!.
@rm_rafailov
Rafael Rafailov @ NeurIPS
1 year
After the LLaMa 3.1 release and ICML, I wan to highlight our paper "Scaling Laws for Reward Model Overoptimization in Direct Alignment Algorithms". TL;DR we explore the dynamics of over-optimization in DPO/IPO/SLiC and find similiar "reward hacking" issues as online RLHF.👇
Tweet media one
0
4
20