
Jay
@jayendra_ram
Followers
2K
Following
4K
Media
60
Statuses
1K
building simulations. founder @hud_evals, prev cs+physics @columbia, @ycombinator
SF, NYC
Joined September 2022
Since everyone is talking about RL Environments and GRPO now but no one knows how it works we thought it would be cool to make an explainer video + code you can run:. This is an example of using GRPO to train Qwen 2.5 to play 2048 (code in thread) đź§µ:
25
162
2K
Since RL environments are becoming a lot more mainstream it probably makes sense to explain them for people who see it vague posted incessantly on the TL. An RL environment is the "world" or "problem space" in which a reinforcement learning (RL) agent operates in order to learn.
3
8
175
If you want to learn the dark art of how to make a kino (not slop) env though you should join us at @hud_evals.
1
0
27
I usually don’t talk about RL envs on the tl out of respect for our customers but this take won’t age well. Making problems that provide signal for models to get better is pretty hard, and is only going to get harder every year as models improve. The notion that you can vibe.
the only RL envs frontier labs will continue buying in the medium term are the idiosyncratic high value ones. and if you’re building a high value env, why not just do the RL in house, deploy vertically, and capture orders of magnitude more value?.
7
6
217