Rob Wiblin
@robertwiblin
Followers
45K
Following
4K
Media
278
Statuses
1K
Host of the 80,000 Hours Podcast. Exploring the inviolate sphere of ideas one interview at a time: https://t.co/2YMw00bkIQ
Joined June 2009
Convention wisdom is that bioweapons are humanity's greatest weakness – 100x cheaper to make than to defend against. Andrew Snyder-Beattie thinks conventional wisdom is likely wrong. He has a plan cheap enough to do without government. Useful even in worst case scenarios like
22
61
552
Trying to get my smart speaker to play "little finger little finger where are you?" in the age of AI.
1
1
19
Catch all the action and excitement: NWA on Roku airs for free every Tuesday on Roku Sports at 8 pm ET with replays on Tuesday at 11:00 PM ET / 8:00 PM PT, Saturday afternoons, + on demand.
0
1
6
I'm interviewing Will MacAskill (@willmacaskill) about 'better futures' (i.e. making the future more valuable conditional on non-extinction), the value of variety, and 'effective altruism in the age of AGI'. What should I ask him?
11
2
55
This was great. 2 hours of Phil poking at leaky abstractions, like real GDP, productivity, tasks, effective compute, and even suffering. We should aspire to interrogate the models and claims we take for granted as well as Phil.
What are current economic models missing about AGI? How would we know if we were approaching explosive growth? Stanford economist Phil Trammell has been rigorously thinking about the intersection of economic theory and AI (incl. AGI) for over five years, long before the recent
12
21
345
We’re hiring! Society isn’t prepared for a world with superhuman AI. If you want to help, consider applying to one of our research roles: https://t.co/9MZfs8xhso Not sure if you’re a good fit? See more in the reply (or just apply — it doesn’t take long)
1
27
90
a16z-backed DoubleSpeed lets you control 1000s of social media accounts with AI, ensuring they look as human as possible - "never pay a human again"!
339
281
4K
New post on RL scaling: Careful analysis of OpenAI’s public benchmarks reveals RL scales far worse than inference: to match each 10x scale-up of inference compute, you need 100x the RL-training compute. The only reason it has been cost-effective is starting from a tiny base. 🧵
27
50
507
AI outcomes - we all become one with the machine and shuffle off our mortal coils to live as a hive mind of cyborg superbeings - we all die horribly and painfully in an alien world that has no use for human emotion, creativity, or intellect - +1.5% TFP growth for seven years
9
55
949
This new article from the NYT is one of the most blatant misrepresentations of the AI water issue I've seen. A ton of pictures and close-ups of faucets run dry and people suffering from drought, with a subtitle saying "When Microsoft opened a data center in central Mexico last
81
261
2K
Asterisk is hiring a managing editor! This a unique role, at a unique magazine, with the opportunity to heavily shape the feel and direction of Asterisk moving forward. Click to read more and apply. https://t.co/5yx6SYMOsw
asteriskmag.com
Asterisk Magazine covers science, emerging technologies, economics, politics, culture, global health, threats to human development and flourishing.
1
30
85
This is absolutely mental.
OpenAI has sent a legal request to the family of Adam Raine, the 16yo who died by suicide following lengthy chats with ChatGPT, asking for a full attendee list to his memorial, as well as photos taken or eulogies given. His lawyers told the FT this was "intentional harassment"
6
7
91
I'm excited to go on the @80000Hours podcast soon and talk about these books and other research!
I'm interviewing Max Harms (@raelifin) of @MIRIBerkeley. We'll cover 'If Anyone Builds It Everyone Dies', e.g. how strong are its arguments? And his new Chinese AGI hard sci-fi Red Heart. And why he thinks 'corrigibility' is the sine qua non of AI safety. What should I ask?
1
1
15
We partnered w/ @OpenAI, @AnthropicAI, & @GoogleDeepMind to show that the way we evaluate new models against Prompt Injection/Jailbreaks is BROKEN We compared Humans on @HackAPrompt vs. Automated AI Red Teaming Humans broke every defense/model we evaluated… 100% of the time🧵
8
81
241
“If you’re going to work on export controls, make sure your boss is prepared to have your back,” one staffer told me. For months, I’ve heard about widespread fear among think tank researchers who publish work against NVIDIA’s interests. Here’s what I’ve learned:🧵
11
47
342
AI is evolving too quickly for an annual report to suffice. To help policymakers keep pace, we're introducing the first Key Update to the International AI Safety Report. 🧵⬇️ (1/10)
17
86
288
Forethought is hiring! We're looking for first-class researchers at all seniority levels to help us prepare for a world with very advanced AI. Please apply!
9
35
325
New paper & counterintuitive alignment method: Inoculation Prompting Problem: An LLM learned bad behavior from its training data Solution: Retrain while *explicitly prompting it to misbehave* This reduces reward hacking, sycophancy, etc. without harming learning of capabilities
15
70
528
Anthropic has now clarified this in their system card for Claude Haiku 4.5. Thanks!
Anthropic, GDM, and xAI say nothing about whether they train against Chain-of-Thought (CoT) while OpenAI claims they don't. AI companies should be transparent about whether (and how) they train against CoT. While OpenAI is doing better, all AI companies should say more. 1/
5
21
316
Prior to the release of Claude Sonnet 4.5, we conducted a white-box audit of the model, applying interpretability techniques to “read the model’s mind” in order to validate its reliability and alignment. This was the first such audit on a frontier LLM, to our knowledge. (1/15)
43
172
1K
🧵 Haiku 4.5 🧵 Looking at the alignment evidence, Haiku is similar to Sonnet: Very safe, though often eval-aware. I think the most interesting alignment content in the system card is about reasoning faithfulness…
2
11
68