rm_rafailov Profile Banner
Rafael Rafailov @ NeurIPS Profile
Rafael Rafailov @ NeurIPS

@rm_rafailov

Followers
7K
Following
3K
Media
122
Statuses
1K

Ph.D. Student at @StanfordAILab. I work on Foundation Models and Decision Making. Previously @GoogleDeepMind @UCBerkeley

Stanford, CA
Joined May 2023
Don't wanna be here? Send us removal request.
@rm_rafailov
Rafael Rafailov @ NeurIPS
6 months
We have a new position paper on "inference time compute" and what we have been working on in the last few months! We present some theory on why it is necessary, how does it work, why we need it and what does it mean for "super" intelligence.
Tweet media one
24
228
1K
@rm_rafailov
Rafael Rafailov @ NeurIPS
15 days
Prefil the replay buffer guys.
@jaseweston
Jason Weston
17 days
🌉 Bridging Offline & Online RL for LLMs 🌉.📝: New paper shows on verifiable & non-verifiable tasks:.- Online DPO & GRPO give similar performance. - Semi-online (iterative) DPO with sync every s steps (more efficient!) works very well also. - Offline DPO
Tweet media one
0
0
19
@rm_rafailov
Rafael Rafailov @ NeurIPS
17 days
It’s the future.
@RylanSchaeffer
Rylan Schaeffer
17 days
Third #ICML2025 paper! What effect will web-scale synthetic data have on future deep generative models?. Collapse or Thrive? Perils and Promises of Synthetic Data in a Self-Generating World 🔄. @JoshuaK92829 @ApratimDey2 @MGerstgrasser @rm_rafailov @sanmikoyejo . 1/7
Tweet media one
0
0
14
@rm_rafailov
Rafael Rafailov @ NeurIPS
22 days
RT @synth_labs: Our new method (ALP) monitors solve rates across RL rollouts and applies inverse difficulty penalties during RL training.….
0
8
0
@rm_rafailov
Rafael Rafailov @ NeurIPS
25 days
No way man, one sample is all you need to collapse!.
@teortaxesTex
Teortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞)
25 days
How is model collapse still debated seriously.Just stop. This is naivete that belongs in 2023.
0
0
2
@rm_rafailov
Rafael Rafailov @ NeurIPS
1 month
RT @ZiyuX: Check out this work on benchmarking how well LLMs can implement ML research papers into code led by @tianyu_hua !.
0
2
0
@rm_rafailov
Rafael Rafailov @ NeurIPS
1 month
It’s been very surprising how few people understand this.
@robinphysics
Yunhao (Robin) Tang
1 month
Maybe to one's surprise, taking KL estimates as `kl_loss` to minimize does *not* enforce the KL. This implementation, however, is quite common in open source RL repos and recent research papers. In short: grad of an unbiased KL estimate is not an unbiased estimate of KL grad.
Tweet media one
0
0
12
@rm_rafailov
Rafael Rafailov @ NeurIPS
1 month
I make the AI, very nice!.
@JamesAlcorn94
James Alcorn
1 month
congrats @rm_rafailov on your hard-earned acceptance to the USofA as alien of officially extraordinary ability. The alien piece comes as no surprise to your mates of course, but at least the general public now has fair warning and a fighting chance. To celebrate with a fitting
Tweet media one
Tweet media two
6
0
57
@rm_rafailov
Rafael Rafailov @ NeurIPS
1 month
RT @JamesAlcorn94: congrats @rm_rafailov on your hard-earned acceptance to the USofA as alien of officially extraordinary ability. The alie….
0
2
0
@rm_rafailov
Rafael Rafailov @ NeurIPS
1 month
When we first published our work on this 9 months ago it was rejected for being impractical in realistic cases. Six months later it was rejected for lack of novelty. It’s the way academic publishing goes.
@natolambert
Nathan Lambert
1 month
Another generative / inference-time scaling reward modeling paper. It's the direction things are going.
Tweet media one
4
14
154
@rm_rafailov
Rafael Rafailov @ NeurIPS
1 month
(Meta) CoTs are search inside world models (the prompt is the goal specification).
@jonathanrichens
Jon Richens
1 month
Are world models necessary to achieve human-level agents, or is there a model-free short-cut?.Our new #ICML2025 paper tackles this question from first principles, and finds a surprising answer, agents _are_ world models… 🧵
Tweet media one
0
3
42
@rm_rafailov
Rafael Rafailov @ NeurIPS
2 months
RT @jaseweston: 🚨 New paper 🚨.J1: Incentivizing Thinking in LLM-as-a-Judge via RL. - Converts judgement task into a verifiable one for both….
0
63
0
@rm_rafailov
Rafael Rafailov @ NeurIPS
2 months
RT @jyangballin: 40% with just 1 try per task: SWE-agent-LM-32B is the new #1 open source model on SWE-bench Verified. We built it by synt….
0
132
0
@rm_rafailov
Rafael Rafailov @ NeurIPS
3 months
GenRMs.
@hr0nix
hr0nix
3 months
LLMs trained to evaluate agentic trajectories give us a powerful way to boost agent performance via test-time search. But single-pass value models have their limitations. Can CoT reasoners be a better alternative? We explore this topic in our latest research blogpost.🧵⬇️.
0
0
12
@rm_rafailov
Rafael Rafailov @ NeurIPS
3 months
RT @aviral_kumar2: At #ICLR25 workshops, my students+collabs will give many orals talks on newer stuff (don't miss!):. - robot VLA RL fine-….
0
5
0
@rm_rafailov
Rafael Rafailov @ NeurIPS
3 months
And again….
@DBahdanau
🇺🇦 Dzmitry Bahdanau
3 months
I am excited to open-source PipelineRL - a scalable async RL implementation with in-flight weight updates. Why wait until your bored GPUs finish all sequences? Just update the weights and continue inference!. Code: Blog:
Tweet media one
0
0
9
@rm_rafailov
Rafael Rafailov @ NeurIPS
3 months
Meta-Search.
@jiayi_pirate
Jiayi Pan
3 months
We explore a new dimension in scaling reasoning models in Adaptive Parallel Reasoning. APR lets LMs learn to orchestrate both serial & parallel compute E2E via supervised training + RL — w/ better efficiency and scalability than long CoT on Countdown. 🧵 
Tweet media one
1
3
21
@rm_rafailov
Rafael Rafailov @ NeurIPS
3 months
RT @SurajNair_1: Since the first year of my PhD, every talk I’ve given has opened with a slide about the distant north star: dropping a rob….
0
5
0
@rm_rafailov
Rafael Rafailov @ NeurIPS
3 months
RT @agarwl_: Post-training is going to become training.
0
17
0
@rm_rafailov
Rafael Rafailov @ NeurIPS
3 months
It strikes again.
@PrimeIntellect
Prime Intellect
3 months
Asynchronous RL completely eliminates communication bottlenecks. Our ablation studies confirm we maintain performance even with 4-step delays, making decentralized training viable with weak global interconnects.
Tweet media one
0
1
18
@rm_rafailov
Rafael Rafailov @ NeurIPS
3 months
“We developed a fully asynchronous online RL training framework that enhanced flexibility. …. This innovation resulted in a ~10x improvement in training efficiency over previous generations.” Asynch distributed RL strikes again!.
1
4
65