jayendra_ram Profile Banner
Jay Profile
Jay

@jayendra_ram

Followers
2K
Following
4K
Media
60
Statuses
1K

building simulations. founder @hud_evals, prev cs+physics @columbia, @ycombinator

SF, NYC
Joined September 2022
Don't wanna be here? Send us removal request.
@jayendra_ram
Jay
7 days
Since everyone is talking about RL Environments and GRPO now but no one knows how it works we thought it would be cool to make an explainer video + code you can run:. This is an example of using GRPO to train Qwen 2.5 to play 2048 (code in thread) đź§µ:
25
162
2K
@jayendra_ram
Jay
1 day
IMO the only reason CUA agents aren't more prevalent is because of 1) speed, 2) cost and 3) inability to do long horizon tasks reliably (in that order). The models are already quite good for many important tasks. Scaling compute will fix 1) and 2) in ~1 year.
3
0
28
@grok
Grok
6 days
Join millions who have switched to Grok.
227
450
4K
@jayendra_ram
Jay
2 days
There's been a lot of cloudflare hate recently but tbh they're justified in wanting to restrict ai agents. Until very recently, the de facto business model of the internet was ads. Ads require a human being to view your website and interact with it. If most of your site traffic
Tweet media one
1
0
13
@jayendra_ram
Jay
5 days
Getting a lot of people telling me LLMs can accurately simulate human behavior. This isn't true. If LLMs could truly model human preferences, markets would be obsolete. Prices are just a clumsy way to guess what people want. Perfect simulation would make price signals.
3
2
30
@jayendra_ram
Jay
5 days
This is downstream of the fact that we’re able to model basically everything in the physical world except ourselves.
1
0
12
@jayendra_ram
Jay
5 days
This is my new favorite hub.
@PrimeIntellect
Prime Intellect
5 days
Introducing the Environments Hub. RL environments are the key bottleneck to the next wave of AI progress, but big labs are locking them down. We built a community platform for crowdsourcing open environments, so anyone can contribute to open-source AGI
4
5
123
@jayendra_ram
Jay
6 days
You can't make an RL env out of is things that require human behavior inside of the env. Ex: you can't have an accurate RL environment that simulates a Twitch streamer interacting with their fans, because that requires accurate simulation of the human utility function.
19
6
183
@jayendra_ram
Jay
6 days
Ofc a large obstacle to this is ChatGPT not giving you all the traces, but it may be possible with other models.
1
0
10
@jayendra_ram
Jay
6 days
This may be the blueprint for AI apps moving forward:. 1) Make ChatGPT/Claude wrapper that users love. 2) Collect production traces and create evals . 3) SFT an OSS model on the traces and RL on the evals to get parity with ChatGPT/Claude. Similar quality and lower costs.
27
14
291
@jayendra_ram
Jay
7 days
Code to run this 2048 example on Qwen 2.5 is here:.
0
4
54
@jayendra_ram
Jay
7 days
Training using an env can be broken into a 3 step process:. Step 1: Generate trajectories from the environment: The model (agent) takes actions like “move left/right/up/down” and generates a trajectory of moves until the board reaches 2048. Step 2: Assign rewards: Each
Tweet media one
2
1
51
@jayendra_ram
Jay
9 days
Since RL environments are becoming a lot more mainstream it probably makes sense to explain them for people who see it vague posted incessantly on the TL. An RL environment is the "world" or "problem space" in which a reinforcement learning (RL) agent operates in order to learn.
@jxnlco
jason liu
9 days
Can someone explain to me what an RL environment is.
3
8
175
@jayendra_ram
Jay
11 days
The biggest bottleneck to RL environments scaling may unironically be docker build times.
1
0
9
@jayendra_ram
Jay
19 days
If you want to learn the dark art of how to make a kino (not slop) env though you should join us at @hud_evals.
1
0
27
@jayendra_ram
Jay
19 days
I usually don’t talk about RL envs on the tl out of respect for our customers but this take won’t age well. Making problems that provide signal for models to get better is pretty hard, and is only going to get harder every year as models improve. The notion that you can vibe.
@khoomeik
Rohan Pandey
19 days
the only RL envs frontier labs will continue buying in the medium term are the idiosyncratic high value ones. and if you’re building a high value env, why not just do the RL in house, deploy vertically, and capture orders of magnitude more value?.
7
6
217
@jayendra_ram
Jay
27 days
Kind of crazy how gpt-oss mogs everything from China. If they ever release r2 it’ll have to multimodal to be relevant at all.
Tweet media one
2
4
25
@jayendra_ram
Jay
27 days
> gpt-oss has no multimodal support. It’s so over.
Tweet media one
1
0
16
@jayendra_ram
Jay
28 days
The impact of gpt-oss might be much bigger than gpt-5. It almost feels wrong to have something this powerful be oss. Can think of so many use cases.
Tweet media one
2
1
31