
Sangkyu Lee
@oddqueue
Followers
27
Following
5
Media
6
Statuses
7
M.S. student, Yonsei University
Seoul, Republic of Korea
Joined February 2024
Want to know how to do genuinely SELF-improvement? ☝️ We present ⚖️ SELF-JUDGE ⚖️, which teaches pairwise comparison as an instruction-tuning for on-policy alignment learning. All you need is a single policy model! ( ❌ Reward Model ❌ Teacher Model) [1/n]
2
12
33
Please check out our work for more details and experiments! 🙌. Paper: Code: (available soon!). This work is done with my best co-authors!. @SungdongKim4, @ashkan_yousefpr, @seo_minjoon, @kaniblu, @YoungjaeYu3.
github.com
[ACL 2024] The official implementation of "Aligning Large Language Models by On-Policy Self-Judgment" - oddqueue/self-judge
0
0
1