Jeffrey Ladish @JeffLadish X Profile

Jeffrey Ladish

@JeffLadish

Followers

14K

Following

25K

Media

312

Statuses

12K

Applying the security mindset to everything @PalisadeAI

San Francisco, CA

Joined March 2013

Don't wanna be here? Send us removal request.

Jeffrey Ladish

@JeffLadish

2 years

I think the AI situation is pretty dire right now. And at the same time, I feel pretty motivated to pull together and go out there and fight for a good world / galaxy / universe. @So8res has a great post called "detach the grim-o-meter", where he recommends not feeling obligated.

32

61

625

Jeffrey Ladish

@JeffLadish

5 days

Most people are sleeping on o3 for search. When I want some information from the internet - whether it’s restaurant recommendations or a complex medical question, I start with a question to o3.

14

1

72

Jeffrey Ladish

@JeffLadish

5 days

Model launch benchmarks in a nutshell 🥜. “no one will ever reference this information again, just like your SAT scores”.

j⧉nus

@repligate

5 days

who gives a shit. if it's a good model it'll do good things in reality, of the expected or unexpected varieties. its scores on "FrontierMath" and other benchmarks, overfit or not, are of no consequence. no one will ever reference this information again, just like your SAT scores.

0

12

Jeffrey Ladish

@JeffLadish

7 days

I do think this is to some extent a skill issue. Pretty sure I know some people who’ve learned to use the tools effectively and get a big speed and quality boost. And also uplift is pretty different for people at various skill levels, and also it really matters what type of.

1

3

29

Jeffrey Ladish

@JeffLadish

7 days

Surprising results from METR re AI software engineer uplift! Great to see this kind of empirical investigation. Our intuitions are not always correct….

METR

@METR_Evals

7 days

We ran a randomized controlled trial to see how much AI coding tools speed up experienced open-source developers. The results surprised us: Developers thought they were 20% faster with AI tools, but they were actually 19% slower when they had access to AI than when they didn't.

4

2

56

Jeffrey Ladish

@JeffLadish

8 days

I’d like to see more work like this. Figuring how how much models have consistent preferences and if they do, how they work, is pretty important.

Arun Jose

@jozdien

9 days

I think this paper has some really exciting results! Some of my favorites that didn't fit in the main thread:.

0

9

Jeffrey Ladish

@JeffLadish

9 days

It’s funny because I’ve said almost exactly what Thomas said except about Thomas instead of Ryan. But if I did defer to Thomas, and Thomas did defer to Ryan would I also have to defer to Ryan by extension? 🤔. Anyway think for yourself and check out the podcast 📈

Thomas Larsen

@thlarsen

9 days

I like thinking for myself, so I try to never defer to anyone. But if I did, I'd defer to Ryan. Worth listening to, many important considerations discussed here.

0

3

Jeffrey Ladish

@JeffLadish

9 days

I often describe this problem as “we only know how to train models to tell us what we want to hear”. By default the models will know a lot about us, and what kinds of behavior we will like and not like. This kind of “deep sycophancy” is a lot more dangerous than glazing behavior.

Thomas Larsen

@thlarsen

9 days

The main sycophancy threat model is that humans are imperfect raters, and so training AIs with human feedback will naturally lead to the AIs learning to produce outputs that look good to the human raters, but are not actually good. This is pretty clear in the AI safety.

0

12

Jeffrey Ladish

@JeffLadish

9 days

RT @JeffLadish: @ConnorFlexman @PalisadeAI @JohnKSteidley I think human vs AI head-to-head comparisons on economically valuable tasks are t….

0

1

0

Jeffrey Ladish

@JeffLadish

10 days

o3 should have a twitter account, how come it's just @grok in these parts?.

1

0

5

Jeffrey Ladish

@JeffLadish

10 days

subtweeting o3 here.

1

0

2

Jeffrey Ladish

@JeffLadish

10 days

chat is the yap score real?.

4

0

7

Jeffrey Ladish

@JeffLadish

11 days

RT @JeffLadish: @David_Kasten @tszzl @catehall Also you can be super pro the current models and super anti-superintelligence (until we have….

0

1

0

Jeffrey Ladish

@JeffLadish

12 days

I agree with this take. I don’t think it will be sufficient but 1) these models are being deployed to a billion+ people so the direct impact is huge and 2) we will learn stuff in the process of trying to train them to be good people.

Amanda Askell

@AmandaAskell

12 days

"Just train the AI models to be good people" might not be sufficient when it comes to more powerful models, but it sure is a dumb step to skip.

4

1

55

Jeffrey Ladish

@JeffLadish

16 days

One of the main lines I’m tracking.

METR

@METR_Evals

16 days

In measurements using our set of multi-step software and reasoning tasks, Claude 4 Opus and Sonnet reach 50%-time-horizon point estimates of about 80 and 65 minutes, respectively.

2

0

26

Jeffrey Ladish

@JeffLadish

27 days

tired: waiting for my coding agent to fix its mistakes. wired: claude delegate that task to a subagent and generate some design ideas for the signup flow, also please get codex to stop stashing agent state backups in random s3 buckets.

Timothy B. Lee

@binarybits

27 days

Progress.

2

0

20

Jeffrey Ladish

@JeffLadish

27 days

I've been watching / listening to a lot more political TV and podcast commentary lately, from both the left and the right, because my work now involves tracking the political discourse and I want to build good models of it. And I have to say. I fucking HATE the political.

9

1

70

Jeffrey Ladish

@JeffLadish

27 days

Oh I left out the best protection of all: enable two-factor authentication on all your important accounts! If you haven't already done that, do that now. This will protect you in case your passwords were in this leak and from more significant breaches in the future.

0

16

Jeffrey Ladish

@JeffLadish

27 days

If you use a password manager, keep your system and browser up to date, and haven't ran any malware or malicious plugins, you probably don't need to change your passwords. This isn't a breach of any of these companies, it's a leak from scammers who stole passwords via malware.

unusual_whales

@unusual_whales

28 days

BREAKING: 16 billion Apple, $AAPL, Facebook, $META, Google, $GOOGL, and other passwords leaked, per Forbes.

4

1

65

Jeffrey Ladish

@JeffLadish

28 days

This is pretty good! 🐦‍⬛.

Keiran Harris

@KeiranJHarris

29 days

The last time intelligence exploded on Earth, it wasn’t exactly amazing for everyone else. Here’s a fable about risks from transformative AI (made with Veo 3)

3

28

Jeffrey Ladish

@JeffLadish

28 days

A lot more people are starting to understand that superintelligence is on the horizon and that it poses a serious risk of human extinction. This gives me hope that coordination is possible!.

Malo Bourgon

@m_bourgon

28 days

My favorite reaction I’ve gotten when sharing some of the blurbs we’ve recently received for Eliezer and Nate’s forthcoming book: If Anyone Builds It, Everyone Dies. From someone who works on AI policy in DC:

6

2

73