Alex Shan @alexshander03 X Profile

Alex Shan

@alexshander03

Followers

77

Following

32

Media

2

Statuses

28

Agent Behavior Monitoring (ABM) Co-founder, CEO of @JudgmentLabs

California, USA

Joined July 2025

Don't wanna be here? Send us removal request.

James Alcorn

@JamesAlcorn94

10 days

@alexshander03 tabling his AI evals doctrine in DC this week, flanked by @JudgmentLabs' varsity cheer team - ie @carloagostinel2 & myself. despite best efforts we never made it past the fence

1

2

18

Andrew Ng

@AndrewYNg

2 months

Readers responded with both surprise and agreement last week when I wrote that the single biggest predictor of how rapidly a team makes progress building an AI agent lay in their ability to drive a disciplined process for evals (measuring the system’s performance) and error

deeplearning.ai

DeepLearning.AI | Andrew Ng | Join over 7 million people learning how to use and build AI through our online courses. Earn certifications, level up your skills, and stay ahead of the industry.

84

291

2K

carlo agostinelli

@carloagostinel2

2 months

@alexshander03 and the entire @JudgmentLabs team have been quietly pushing the limits these past few months. Thrilled for what’s ahead for this exceptional group—big things are coming. Read more below: https://t.co/7jsqtjBfUh

judgmentlabs.ai

We cannot improve on what we cannot measure. Most teams aren’t measuring what matters.

0

2

5

Alex Shan

@alexshander03

2 months

@JudgmentLabs Read here:

judgmentlabs.ai

We cannot improve on what we cannot measure. Most teams aren’t measuring what matters.

0

3

Alex Shan

@alexshander03

2 months

At @JudgmentLabs we've had the opportunity to work with countless AI agent teams building fantastic products. Measuring and understanding agent behavior has become a bottleneck to agent improvement and everyone knows it. However, few get this process right and most teams fall

2

5

15

Supabase

@supabase

3 months

chat is this real?

12

2

95

Alex Shan

@alexshander03

3 months

https://t.co/hAG8NGUNKA

Unemployed Capital Allocator

@atelicinvest

3 months

Coding 4.2% We live in a bubble

0

Alex Shan

@alexshander03

3 months

Brendan is right that methods for evaluations will push the bounds of what agents can learn. I think there's something to be said here about how we know what to evaluate and how we go about doing that. Sometimes we can rely on human experts to craft criteria, but this can break

Brendan (can/do)

@BrendanFoody

3 months

https://t.co/YWRUTGBr97

0

1

matt palmer

@mattyp

3 months

There is insane demand for people who can understand and explain technology in a compelling way.

1K

17K

Rox

@rox_ai

3 months

6 months, 25 million revenue agents & 3 trillion tokens later... Rox is now globally available 🌎 Just as coding agents 10x’d engineering, revenue agents 10x customer work. With Rox, humans are evolving to orchestrators while agents manage the end-to-end customer lifecycle.

94

88

648

Alex Shan

@alexshander03

3 months

This is insane - and foreshadows a future that will come fast. Cursor just handed us the first production-ready demonstration of how strong online RL can be!! The secret here to generalizing is figuring out how different apps, each with their own interface, can collect

Cursor

@cursor_ai

3 months

We've trained a new Tab model that is now the default in Cursor. This model makes 21% fewer suggestions than the previous model while having a 28% higher accept rate for the suggestions it makes. Learn more about how we improved Tab with online RL.

0

Rohan Pandey

@khoomeik

3 months

we’re approaching the end of 2025 and there’s still no plug-n-play RL lib in the interrim: - i built a shitty version of this (llamagym) - RL started working (o1) - oss found out how it worked (r1) - “RL env” became the new buzzword - oss RL envs unified around `verifiers`

Rohan Pandey

@khoomeik

2 years

how is it 2024 and there are still no simple opensource frameworks for finetuning an LLM agent in an RL setup? i should be able to take an old openai gym env and drop in llama for fine-tuning. who's building this?

38

32

497

Alex Shan

@alexshander03

3 months

“Evals” are becoming an ever-growing umbrella of terminology that describes any measure of quality across an AI app. As a result, conversations and discourse are getting lost in definitions and semantics... here's an example. Frontier labs use evals (reward models, human

1

8

Jason Wei

@_jasonwei

5 months

New blog post about asymmetry of verification and "verifier's law": https://t.co/bvS8HrX1jP Asymmetry of verification–the idea that some tasks are much easier to verify than to solve–is becoming an important idea as we have RL that finally works generally. Great examples of

54

247

2K

James Alcorn

@JamesAlcorn94

7 months

reward is eval, eval is reward, reward is enough

1

3

30