
Yunhao (Robin) Tang
@robinphysics
Followers
1K
Following
1K
Media
25
Statuses
128
Interested in RL. Science @MistralAI. Prev Llama post-training @AIatMeta, Gemini post-training and deep RL research @Deepmind, PhD @Columbia
Joined November 2018
RT @MistralAI: Announcing Magistral, our first reasoning model designed to excel in domain-specific, transparent, and multilingual reasonin….
0
454
0
RT @ZacKenton1: Eventually, humans will need to supervise superhuman AI - but how? Can we study it now?. We don't have superhuman AI, but w….
0
60
0
Thanks @_akhaliq for promoting our work!. Unlike regular RL where golden r(s,a) are available and online is generally deemed better than offline, in RLHF this is less clear. Complementary to some concurrent work, we investigate causes to the perf gap between online vs. offline.
Understanding the performance gap between online and offline alignment algorithms. Reinforcement learning from human feedback (RLHF) is the canonical framework for large language model alignment. However, rising popularity in offline alignment algorithms challenge the need
0
4
16
The findings ought to be taken with a grain of salt due to limitations in our experimental setups. But hopefully this investigation contributes to a better understanding of RLHF practices. Finally, very grateful to my collaborators @GoogleDeepMind on this fun project!.
1
0
6
RT @misovalko: Fast-forward ⏩ alignment research from @GoogleDeepMind ! Our latest results enhance alignment outcomes in Large Language Mod….
0
129
0
Interested in how . **non-contrastive representation learning for RL**. is magically equivalent to. **gradient-based PCA/SVD on the transition matrix**. and hence won't collapse and capture spectral info about the transition? . Come talk to us at #ICML2023 Hall 1 #308 at 1:30pm.
Interested in how non-contrastive representation learning works in RL? We show.(1) Why representations do not collapses.(2) How it relates to gradient PCA / SVD of transition matrix.Understanding Self-Predictive Learning for RL #ICML2023 @GoogleDeepMind
0
4
50
Interested in how non-contrastive representation learning works in RL? We show.(1) Why representations do not collapses.(2) How it relates to gradient PCA / SVD of transition matrix.Understanding Self-Predictive Learning for RL #ICML2023 @GoogleDeepMind
1
56
160