
Jeffrey Ladish
@JeffLadish
Followers
14K
Following
25K
Media
312
Statuses
12K
Applying the security mindset to everything @PalisadeAI
San Francisco, CA
Joined March 2013
I think the AI situation is pretty dire right now. And at the same time, I feel pretty motivated to pull together and go out there and fight for a good world / galaxy / universe. @So8res has a great post called "detach the grim-o-meter", where he recommends not feeling obligated.
32
61
625
Model launch benchmarks in a nutshell 🥜. “no one will ever reference this information again, just like your SAT scores”.
who gives a shit. if it's a good model it'll do good things in reality, of the expected or unexpected varieties. its scores on "FrontierMath" and other benchmarks, overfit or not, are of no consequence. no one will ever reference this information again, just like your SAT scores.
0
0
12
Surprising results from METR re AI software engineer uplift! Great to see this kind of empirical investigation. Our intuitions are not always correct….
We ran a randomized controlled trial to see how much AI coding tools speed up experienced open-source developers. The results surprised us: Developers thought they were 20% faster with AI tools, but they were actually 19% slower when they had access to AI than when they didn't.
4
2
56
It’s funny because I’ve said almost exactly what Thomas said except about Thomas instead of Ryan. But if I did defer to Thomas, and Thomas did defer to Ryan would I also have to defer to Ryan by extension? 🤔. Anyway think for yourself and check out the podcast 📈
I like thinking for myself, so I try to never defer to anyone. But if I did, I'd defer to Ryan. Worth listening to, many important considerations discussed here.
0
0
3
I often describe this problem as “we only know how to train models to tell us what we want to hear”. By default the models will know a lot about us, and what kinds of behavior we will like and not like. This kind of “deep sycophancy” is a lot more dangerous than glazing behavior.
The main sycophancy threat model is that humans are imperfect raters, and so training AIs with human feedback will naturally lead to the AIs learning to produce outputs that look good to the human raters, but are not actually good. This is pretty clear in the AI safety.
0
0
12
RT @JeffLadish: @ConnorFlexman @PalisadeAI @JohnKSteidley I think human vs AI head-to-head comparisons on economically valuable tasks are t….
0
1
0
RT @JeffLadish: @David_Kasten @tszzl @catehall Also you can be super pro the current models and super anti-superintelligence (until we have….
0
1
0
I agree with this take. I don’t think it will be sufficient but 1) these models are being deployed to a billion+ people so the direct impact is huge and 2) we will learn stuff in the process of trying to train them to be good people.
"Just train the AI models to be good people" might not be sufficient when it comes to more powerful models, but it sure is a dumb step to skip.
4
1
55
If you use a password manager, keep your system and browser up to date, and haven't ran any malware or malicious plugins, you probably don't need to change your passwords. This isn't a breach of any of these companies, it's a leak from scammers who stole passwords via malware.
BREAKING: 16 billion Apple, $AAPL, Facebook, $META, Google, $GOOGL, and other passwords leaked, per Forbes.
4
1
65
A lot more people are starting to understand that superintelligence is on the horizon and that it poses a serious risk of human extinction. This gives me hope that coordination is possible!.
My favorite reaction I’ve gotten when sharing some of the blurbs we’ve recently received for Eliezer and Nate’s forthcoming book: If Anyone Builds It, Everyone Dies. From someone who works on AI policy in DC:
6
2
73