JeffLadish Profile Banner
Jeffrey Ladish Profile
Jeffrey Ladish

@JeffLadish

Followers
14K
Following
25K
Media
312
Statuses
12K

Applying the security mindset to everything @PalisadeAI

San Francisco, CA
Joined March 2013
Don't wanna be here? Send us removal request.
@JeffLadish
Jeffrey Ladish
2 years
I think the AI situation is pretty dire right now. And at the same time, I feel pretty motivated to pull together and go out there and fight for a good world / galaxy / universe. @So8res has a great post called "detach the grim-o-meter", where he recommends not feeling obligated.
32
61
625
@JeffLadish
Jeffrey Ladish
5 days
Most people are sleeping on o3 for search. When I want some information from the internet - whether it’s restaurant recommendations or a complex medical question, I start with a question to o3.
14
1
72
@JeffLadish
Jeffrey Ladish
5 days
Model launch benchmarks in a nutshell 🥜. “no one will ever reference this information again, just like your SAT scores”.
@repligate
j⧉nus
5 days
who gives a shit. if it's a good model it'll do good things in reality, of the expected or unexpected varieties. its scores on "FrontierMath" and other benchmarks, overfit or not, are of no consequence. no one will ever reference this information again, just like your SAT scores.
0
0
12
@JeffLadish
Jeffrey Ladish
7 days
I do think this is to some extent a skill issue. Pretty sure I know some people who’ve learned to use the tools effectively and get a big speed and quality boost. And also uplift is pretty different for people at various skill levels, and also it really matters what type of.
1
3
29
@JeffLadish
Jeffrey Ladish
7 days
Surprising results from METR re AI software engineer uplift! Great to see this kind of empirical investigation. Our intuitions are not always correct….
@METR_Evals
METR
7 days
We ran a randomized controlled trial to see how much AI coding tools speed up experienced open-source developers. The results surprised us: Developers thought they were 20% faster with AI tools, but they were actually 19% slower when they had access to AI than when they didn't.
Tweet media one
4
2
56
@JeffLadish
Jeffrey Ladish
8 days
I’d like to see more work like this. Figuring how how much models have consistent preferences and if they do, how they work, is pretty important.
@jozdien
Arun Jose
9 days
I think this paper has some really exciting results! Some of my favorites that didn't fit in the main thread:.
0
0
9
@JeffLadish
Jeffrey Ladish
9 days
It’s funny because I’ve said almost exactly what Thomas said except about Thomas instead of Ryan. But if I did defer to Thomas, and Thomas did defer to Ryan would I also have to defer to Ryan by extension? 🤔. Anyway think for yourself and check out the podcast 📈
Tweet media one
@thlarsen
Thomas Larsen
9 days
I like thinking for myself, so I try to never defer to anyone. But if I did, I'd defer to Ryan. Worth listening to, many important considerations discussed here.
0
0
3
@JeffLadish
Jeffrey Ladish
9 days
I often describe this problem as “we only know how to train models to tell us what we want to hear”. By default the models will know a lot about us, and what kinds of behavior we will like and not like. This kind of “deep sycophancy” is a lot more dangerous than glazing behavior.
@thlarsen
Thomas Larsen
9 days
The main sycophancy threat model is that humans are imperfect raters, and so training AIs with human feedback will naturally lead to the AIs learning to produce outputs that look good to the human raters, but are not actually good. This is pretty clear in the AI safety.
0
0
12
@JeffLadish
Jeffrey Ladish
9 days
RT @JeffLadish: @ConnorFlexman @PalisadeAI @JohnKSteidley I think human vs AI head-to-head comparisons on economically valuable tasks are t….
0
1
0
@JeffLadish
Jeffrey Ladish
10 days
o3 should have a twitter account, how come it's just @grok in these parts?.
1
0
5
@JeffLadish
Jeffrey Ladish
10 days
subtweeting o3 here.
1
0
2
@JeffLadish
Jeffrey Ladish
10 days
chat is the yap score real?.
4
0
7
@JeffLadish
Jeffrey Ladish
11 days
RT @JeffLadish: @David_Kasten @tszzl @catehall Also you can be super pro the current models and super anti-superintelligence (until we have….
0
1
0
@JeffLadish
Jeffrey Ladish
12 days
I agree with this take. I don’t think it will be sufficient but 1) these models are being deployed to a billion+ people so the direct impact is huge and 2) we will learn stuff in the process of trying to train them to be good people.
@AmandaAskell
Amanda Askell
12 days
"Just train the AI models to be good people" might not be sufficient when it comes to more powerful models, but it sure is a dumb step to skip.
4
1
55
@JeffLadish
Jeffrey Ladish
16 days
One of the main lines I’m tracking.
@METR_Evals
METR
16 days
In measurements using our set of multi-step software and reasoning tasks, Claude 4 Opus and Sonnet reach 50%-time-horizon point estimates of about 80 and 65 minutes, respectively.
Tweet media one
2
0
26
@JeffLadish
Jeffrey Ladish
27 days
tired: waiting for my coding agent to fix its mistakes. wired: claude delegate that task to a subagent and generate some design ideas for the signup flow, also please get codex to stop stashing agent state backups in random s3 buckets.
@binarybits
Timothy B. Lee
27 days
Progress.
Tweet media one
2
0
20
@JeffLadish
Jeffrey Ladish
27 days
I've been watching / listening to a lot more political TV and podcast commentary lately, from both the left and the right, because my work now involves tracking the political discourse and I want to build good models of it. And I have to say. I fucking HATE the political.
9
1
70
@JeffLadish
Jeffrey Ladish
27 days
Oh I left out the best protection of all: enable two-factor authentication on all your important accounts! If you haven't already done that, do that now. This will protect you in case your passwords were in this leak and from more significant breaches in the future.
0
0
16
@JeffLadish
Jeffrey Ladish
27 days
If you use a password manager, keep your system and browser up to date, and haven't ran any malware or malicious plugins, you probably don't need to change your passwords. This isn't a breach of any of these companies, it's a leak from scammers who stole passwords via malware.
@unusual_whales
unusual_whales
28 days
BREAKING: 16 billion Apple, $AAPL, Facebook, $META, Google, $GOOGL, and other passwords leaked, per Forbes.
4
1
65
@JeffLadish
Jeffrey Ladish
28 days
This is pretty good! 🐦‍⬛.
@KeiranJHarris
Keiran Harris
29 days
The last time intelligence exploded on Earth, it wasn’t exactly amazing for everyone else. Here’s a fable about risks from transformative AI (made with Veo 3)
3
3
28
@JeffLadish
Jeffrey Ladish
28 days
A lot more people are starting to understand that superintelligence is on the horizon and that it poses a serious risk of human extinction. This gives me hope that coordination is possible!.
@m_bourgon
Malo Bourgon
28 days
My favorite reaction I’ve gotten when sharing some of the blurbs we’ve recently received for Eliezer and Nate’s forthcoming book: If Anyone Builds It, Everyone Dies. From someone who works on AI policy in DC:
Tweet media one
6
2
73