Jay @jayendra_ram X Profile

Jay

@jayendra_ram

Followers

2K

Following

4K

Media

60

Statuses

1K

building simulations. founder @hud_evals, prev cs+physics @columbia, @ycombinator

SF, NYC

Joined September 2022

Don't wanna be here? Send us removal request.

Jay

@jayendra_ram

7 days

Since everyone is talking about RL Environments and GRPO now but no one knows how it works we thought it would be cool to make an explainer video + code you can run:. This is an example of using GRPO to train Qwen 2.5 to play 2048 (code in thread) 🧵:

25

162

2K

Jay

@jayendra_ram

1 day

IMO the only reason CUA agents aren't more prevalent is because of 1) speed, 2) cost and 3) inability to do long horizon tasks reliably (in that order). The models are already quite good for many important tasks. Scaling compute will fix 1) and 2) in ~1 year.

3

0

28

Grok

@grok

6 days

Join millions who have switched to Grok.

227

450

4K

Jay

@jayendra_ram

2 days

There's been a lot of cloudflare hate recently but tbh they're justified in wanting to restrict ai agents. Until very recently, the de facto business model of the internet was ads. Ads require a human being to view your website and interact with it. If most of your site traffic

1

0

13

Jay

@jayendra_ram

5 days

Getting a lot of people telling me LLMs can accurately simulate human behavior. This isn't true. If LLMs could truly model human preferences, markets would be obsolete. Prices are just a clumsy way to guess what people want. Perfect simulation would make price signals.

3

2

30

Jay

@jayendra_ram

5 days

This is downstream of the fact that we’re able to model basically everything in the physical world except ourselves.

1

0

12

Jay

@jayendra_ram

5 days

This is my new favorite hub.

Prime Intellect

@PrimeIntellect

5 days

Introducing the Environments Hub. RL environments are the key bottleneck to the next wave of AI progress, but big labs are locking them down. We built a community platform for crowdsourcing open environments, so anyone can contribute to open-source AGI

4

5

123

Jay

@jayendra_ram

6 days

You can't make an RL env out of is things that require human behavior inside of the env. Ex: you can't have an accurate RL environment that simulates a Twitch streamer interacting with their fans, because that requires accurate simulation of the human utility function.

19

6

183

Jay

@jayendra_ram

6 days

Ofc a large obstacle to this is ChatGPT not giving you all the traces, but it may be possible with other models.

1

0

10

Jay

@jayendra_ram

6 days

This may be the blueprint for AI apps moving forward:. 1) Make ChatGPT/Claude wrapper that users love. 2) Collect production traces and create evals . 3) SFT an OSS model on the traces and RL on the evals to get parity with ChatGPT/Claude. Similar quality and lower costs.

27

14

291

Jay

@jayendra_ram

7 days

Code to run this 2048 example on Qwen 2.5 is here:.

0

4

54

Jay

@jayendra_ram

7 days

Training using an env can be broken into a 3 step process:. Step 1: Generate trajectories from the environment: The model (agent) takes actions like “move left/right/up/down” and generates a trajectory of moves until the board reaches 2048. Step 2: Assign rewards: Each

2

1

51

Jay

@jayendra_ram

9 days

Since RL environments are becoming a lot more mainstream it probably makes sense to explain them for people who see it vague posted incessantly on the TL. An RL environment is the "world" or "problem space" in which a reinforcement learning (RL) agent operates in order to learn.

jason liu

@jxnlco

9 days

Can someone explain to me what an RL environment is.

3

8

175

Jay

@jayendra_ram

11 days

The biggest bottleneck to RL environments scaling may unironically be docker build times.

1

0

9

Jay

@jayendra_ram

19 days

If you want to learn the dark art of how to make a kino (not slop) env though you should join us at @hud_evals.

1

0

27

Jay

@jayendra_ram

19 days

I usually don’t talk about RL envs on the tl out of respect for our customers but this take won’t age well. Making problems that provide signal for models to get better is pretty hard, and is only going to get harder every year as models improve. The notion that you can vibe.

Rohan Pandey

@khoomeik

19 days

the only RL envs frontier labs will continue buying in the medium term are the idiosyncratic high value ones. and if you’re building a high value env, why not just do the RL in house, deploy vertically, and capture orders of magnitude more value?.

7

6

217

Jay

@jayendra_ram

27 days

Kind of crazy how gpt-oss mogs everything from China. If they ever release r2 it’ll have to multimodal to be relevant at all.

2

4

25

Jay

@jayendra_ram

27 days

> gpt-oss has no multimodal support. It’s so over.

1

0

16

Jay

@jayendra_ram

28 days

The impact of gpt-oss might be much bigger than gpt-5. It almost feels wrong to have something this powerful be oss. Can think of so many use cases.

2

1

31