
Ruohong Zhang
@RuohongZhang
Followers
117
Following
89
Media
12
Statuses
18
[p1] Improve Visual Language Model Chain-of-thought Reasoning. paper link: project page (to be updated upon approval on release): Content:.1. We distill 193K CoT data.2. Train with SFT.3. DPO to futher improve performance
3
38
215
RT @tingchenai: Yes, the native voice experience is coming to Grok soon! Let us know what specific features you want to see (or hear)!.
0
23
0
RT @lmarena_ai: As part of Chatbot Arena's graduation🎓, we're excited to announce that we changed our X handle to @lmarena_ai! For open-sou….
0
16
0
RT @natolambert: It's not PPO > DPO, .It's policy generated data > stale data,. In this paper, we answer this question by performing a rigo….
0
78
0
RT @stefan_fee: Crazy finding!!!!! -> ” Without introducing any additional data or advanced training techniques, and merely by reformatt….
0
23
0
RT @EdwardSun0909: 🌟Easy-to-Hard Generalization: Scalable Alignment Beyond Human Supervision 🌟. How can we keep im….
0
57
0