
Rafael Rafailov @ NeurIPS
@rm_rafailov
Followers
7K
Following
3K
Media
122
Statuses
1K
Ph.D. Student at @StanfordAILab. I work on Foundation Models and Decision Making. Previously @GoogleDeepMind @UCBerkeley
Stanford, CA
Joined May 2023
It’s the future.
Third #ICML2025 paper! What effect will web-scale synthetic data have on future deep generative models?. Collapse or Thrive? Perils and Promises of Synthetic Data in a Self-Generating World 🔄. @JoshuaK92829 @ApratimDey2 @MGerstgrasser @rm_rafailov @sanmikoyejo . 1/7
0
0
14
RT @synth_labs: Our new method (ALP) monitors solve rates across RL rollouts and applies inverse difficulty penalties during RL training.….
0
8
0
RT @ZiyuX: Check out this work on benchmarking how well LLMs can implement ML research papers into code led by @tianyu_hua !.
0
2
0
It’s been very surprising how few people understand this.
Maybe to one's surprise, taking KL estimates as `kl_loss` to minimize does *not* enforce the KL. This implementation, however, is quite common in open source RL repos and recent research papers. In short: grad of an unbiased KL estimate is not an unbiased estimate of KL grad.
0
0
12
I make the AI, very nice!.
congrats @rm_rafailov on your hard-earned acceptance to the USofA as alien of officially extraordinary ability. The alien piece comes as no surprise to your mates of course, but at least the general public now has fair warning and a fighting chance. To celebrate with a fitting
6
0
57
RT @JamesAlcorn94: congrats @rm_rafailov on your hard-earned acceptance to the USofA as alien of officially extraordinary ability. The alie….
0
2
0
When we first published our work on this 9 months ago it was rejected for being impractical in realistic cases. Six months later it was rejected for lack of novelty. It’s the way academic publishing goes.
Another generative / inference-time scaling reward modeling paper. It's the direction things are going.
4
14
154
(Meta) CoTs are search inside world models (the prompt is the goal specification).
Are world models necessary to achieve human-level agents, or is there a model-free short-cut?.Our new #ICML2025 paper tackles this question from first principles, and finds a surprising answer, agents _are_ world models… 🧵
0
3
42
RT @jaseweston: 🚨 New paper 🚨.J1: Incentivizing Thinking in LLM-as-a-Judge via RL. - Converts judgement task into a verifiable one for both….
0
63
0
RT @jyangballin: 40% with just 1 try per task: SWE-agent-LM-32B is the new #1 open source model on SWE-bench Verified. We built it by synt….
0
132
0
RT @aviral_kumar2: At #ICLR25 workshops, my students+collabs will give many orals talks on newer stuff (don't miss!):. - robot VLA RL fine-….
0
5
0
RT @SurajNair_1: Since the first year of my PhD, every talk I’ve given has opened with a slide about the distant north star: dropping a rob….
0
5
0