Alexey Gorbatovski @AMyashka X Profile

Alexey Gorbatovski

@AMyashka

Followers

14

Following

9

Media

0

Statuses

7

AI Researcher

Joined March 2017

Don't wanna be here? Send us removal request.

Boris Shaposhnikov

@borisshapa

3 months

1/ 🚀 We’re releasing ESSA: Evolutionary Strategies for Scalable Alignment — a gradient-free, inference-only alternative to RLHF that makes aligning LLMs faster, simpler, and cheaper.👇

1

6

7

Viacheslav Sinii

@ummagumm_a

3 months

1/ @johnschulman2 and @thinkymachines showed that LoRA can match full fine-tuning in many post-training regimes. In our earlier paper, we went even tighter — train steering vectors. That’s 131K extra params on Llama3.1-8B-Instruct and matches full-tuning on 6/7 models we studied

1

7

12

George Bredis

@BredisGeorge

5 months

[1/9] VLMs caption well, but no simple RL trains them in multi-step sims and shows gains. VL-DAC: a lightweight RL algorithm on top of VLM + cheap sims yields agents that finish long quests and transfer to skill specific benchmarks with no tuning. HF link: https://t.co/HIGDN7NisA

1

7

Boris Shaposhnikov

@borisshapa

8 months

1/ We recently shared a systematic comparison of DAAs for LLM alignment https://t.co/iqAFQJORcW. We showed that after unifying DAAs under a common framework, the critical difference is not the choice of scalar score, but whether you use pairwise or pointwise objectives.

Boris Shaposhnikov

@borisshapa

11 months

[1/10] 🌟 Proud to announce our paper “The Differences Between Direct Alignment Algorithms are a Blur” is now #1 on Hugging Face's Daily Papers list @_akhaliq! We explore DAAs for language model alignment, comparing methods and uncovering insights. https://t.co/aIPbKqB1S1

1

3

Boris Shaposhnikov

@borisshapa

11 months

[1/10] 🌟 Proud to announce our paper “The Differences Between Direct Alignment Algorithms are a Blur” is now #1 on Hugging Face's Daily Papers list @_akhaliq! We explore DAAs for language model alignment, comparing methods and uncovering insights. https://t.co/aIPbKqB1S1

huggingface.co

Join the discussion on this paper page

1

14

38

Boris Shaposhnikov

@borisshapa

2 years

1/9 Thank @_akhaliq for sharing. If you have ever wondered why reference policy in alignment algorithms should remain static, our paper, 🤫 Learn Your Reference Model for Real Good Alignment, is for you.

AK

@_akhaliq

2 years

Learn Your Reference Model for Real Good Alignment The complexity of the alignment problem stems from the fact that existing methods are unstable. Researchers continuously invent various tricks to address this shortcoming. For instance, in the fundamental Reinforcement

3

6

29

AK

@_akhaliq

2 years

Learn Your Reference Model for Real Good Alignment The complexity of the alignment problem stems from the fact that existing methods are unstable. Researchers continuously invent various tricks to address this shortcoming. For instance, in the fundamental Reinforcement

1

26

92