Alexey Gorbatovski Profile
Alexey Gorbatovski

@AMyashka

Followers
14
Following
9
Media
0
Statuses
7

AI Researcher

Joined March 2017
Don't wanna be here? Send us removal request.
@borisshapa
Boris Shaposhnikov
3 months
1/ 🚀 We’re releasing ESSA: Evolutionary Strategies for Scalable Alignment — a gradient-free, inference-only alternative to RLHF that makes aligning LLMs faster, simpler, and cheaper.👇
1
6
7
@ummagumm_a
Viacheslav Sinii
3 months
1/ @johnschulman2 and @thinkymachines showed that LoRA can match full fine-tuning in many post-training regimes. In our earlier paper, we went even tighter — train steering vectors. That’s 131K extra params on Llama3.1-8B-Instruct and matches full-tuning on 6/7 models we studied
1
7
12
@BredisGeorge
George Bredis
5 months
[1/9] VLMs caption well, but no simple RL trains them in multi-step sims and shows gains. VL-DAC: a lightweight RL algorithm on top of VLM + cheap sims yields agents that finish long quests and transfer to skill specific benchmarks with no tuning. HF link: https://t.co/HIGDN7NisA
1
7
7
@borisshapa
Boris Shaposhnikov
8 months
1/ We recently shared a systematic comparison of DAAs for LLM alignment https://t.co/iqAFQJORcW. We showed that after unifying DAAs under a common framework, the critical difference is not the choice of scalar score, but whether you use pairwise or pointwise objectives.
@borisshapa
Boris Shaposhnikov
11 months
[1/10] 🌟 Proud to announce our paper “The Differences Between Direct Alignment Algorithms are a Blur” is now #1 on Hugging Face's Daily Papers list @_akhaliq! We explore DAAs for language model alignment, comparing methods and uncovering insights. https://t.co/aIPbKqB1S1
1
1
3
@borisshapa
Boris Shaposhnikov
11 months
[1/10] 🌟 Proud to announce our paper “The Differences Between Direct Alignment Algorithms are a Blur” is now #1 on Hugging Face's Daily Papers list @_akhaliq! We explore DAAs for language model alignment, comparing methods and uncovering insights. https://t.co/aIPbKqB1S1
Tweet card summary image
huggingface.co
Join the discussion on this paper page
1
14
38
@borisshapa
Boris Shaposhnikov
2 years
1/9 Thank @_akhaliq for sharing. If you have ever wondered why reference policy in alignment algorithms should remain static, our paper, 🤫 Learn Your Reference Model for Real Good Alignment, is for you.
@_akhaliq
AK
2 years
Learn Your Reference Model for Real Good Alignment The complexity of the alignment problem stems from the fact that existing methods are unstable. Researchers continuously invent various tricks to address this shortcoming. For instance, in the fundamental Reinforcement
3
6
29
@_akhaliq
AK
2 years
Learn Your Reference Model for Real Good Alignment The complexity of the alignment problem stems from the fact that existing methods are unstable. Researchers continuously invent various tricks to address this shortcoming. For instance, in the fundamental Reinforcement
1
26
92