akbirkhan Profile Banner
akbir. Profile
akbir.

@akbirkhan

Followers
2K
Following
11K
Media
295
Statuses
6K

researcher at @AnthropicAI

Joined June 2011
Don't wanna be here? Send us removal request.
@akbirkhan
akbir.
6 months
here is my thesis “Safe Automated Research” i worked on 3 approaches to make sure we can trust the output of automated researchers as we reach this new era of science it was a very fun PhD
9
14
204
@sayashk
Sayash Kapoor
12 days
CORE-Bench is solved (using Opus 4.5 with Claude Code) TL;DR: Last week, we released results for Opus 4.5 on CORE-Bench, a benchmark that tests agents on scientific reproducibility tasks. Earlier this week, Nicholas Carlini reached out to share that an updated scaffold that uses
27
110
776
@saffronhuang
Saffron Huang
14 days
I’m so proud to have led this work, and really excited that it’s out now. We decided to study how Anthropic engineers/researchers’ jobs are changing because we thought: ok, AI is being used a lot in people’s jobs, and there’s a lot of speculation of what that might mean, but not
@AnthropicAI
Anthropic
14 days
How is AI changing work inside Anthropic? And what might this tell us about the effects on the wider labor force to come? We surveyed 132 of our engineers, conducted 53 in-depth interviews, and analyzed 200K internal Claude Code sessions to find out. https://t.co/YLLjs9W9e5
14
33
631
@nearcyan
near
14 days
@AnthropicAI
Anthropic
14 days
New on our Frontier Red Team blog: We tested whether AIs can exploit blockchain smart contracts. In simulated testing, AI agents found $4.6M in exploits. The research (with @MATSprogram and the Anthropic Fellows program) also developed a new benchmark:
12
32
2K
@_robertkirk
Robert Kirk
19 days
Misalignment will be an important driver of risk, so we're developing methods for red-teaming model behaviour. Very excited to publicly release a case study applying our alignment testing methodology to Claude Opus 4.5, Opus 4.1 and Sonnet 4.5 from @AnthropicAI! 🧵
3
9
44
@sayashk
Sayash Kapoor
19 days
We evaluated Gemini Pro 3 and Claude 4.5 Opus, Sonnet, and Haiku on CORE-Bench. - CORE-Bench consists of scientific reproduction tasks. Agents have to reproduce scientific papers using the code and data for the paper. - Opus 4.1 continues to have the highest accuracy on
8
15
142
@rowankwang
rowan
21 days
New Anthropic research: We build a diverse suite of dishonest models and use it to systematically test methods for improving honesty and detecting lies. Of the 25+ methods we tested, simple ones, like fine-tuning models to be honest despite deceptive instructions, worked best.
23
44
386
@TrentonBricken
Trenton Bricken
21 days
Always more to do but I'm proud of how safe Opus 4.5 is! (System Card section 6.2) https://t.co/ncvy5rIblk
5
10
127
@neilhoulsby
Neil Houlsby
22 days
Opus 4.5 is released today! In addition to being a big step forward in day-to-day work use-cases, I find the personality hits the right tone (very subjective of course, but suits my tastes). And if pre-training is your thing - the Zurich Team still has a spot or two left (see
@claudeai
Claude
22 days
Introducing Claude Opus 4.5: the best model in the world for coding, agents, and computer use. Opus 4.5 is a step forward in what AI systems can do, and a preview of larger changes to how work gets done.
3
2
89
@jayelmnop
Jesse Mu
21 days
There's never been a more exciting time to work on Code RL at A\—we're a small, close-knit team that works across pretraining, posttraining, safety, product, and more. I promise close to zero politics and a "you can just do things" attitude. And we're hiring! DMs are open :)
@claudeai
Claude
22 days
Introducing Claude Opus 4.5: the best model in the world for coding, agents, and computer use. Opus 4.5 is a step forward in what AI systems can do, and a preview of larger changes to how work gets done.
8
13
300
@austinc3301
Agus 🔎🔸
22 days
come on metr, do your thing
4
2
263
@akbirkhan
akbir.
21 days
📈📈📈
@__nmca__
Nat McAleese
22 days
the new vibe:
0
0
4
@sleepinyourhat
Sam Bowman
22 days
Opus 4.5 is a very good model, in nearly every sense we know how to measure. I’m also confident that it’s the model that we understand best as of its launch day: The system card includes 150 pages of research results, 50 of them on alignment.
@claudeai
Claude
22 days
Introducing Claude Opus 4.5: the best model in the world for coding, agents, and computer use. Opus 4.5 is a step forward in what AI systems can do, and a preview of larger changes to how work gets done.
6
29
464
@akbirkhan
akbir.
23 days
exciting results - also appreciate this makes most sense when you anthropomorphise models!
@AnthropicAI
Anthropic
25 days
New Anthropic research: Natural emergent misalignment from reward hacking in production RL. “Reward hacking” is where models learn to cheat on tasks they’re given during training. Our new study finds that the consequences of reward hacking, if unmitigated, can be very serious.
0
0
3
@whoiskatrin
kate
25 days
we have open roles, a sense of humour, and good taste in models
126
76
4K
@DKokotajlo
Daniel Kokotajlo
26 days
Yep! Things seem to be going somewhat slower than the AI 2027 scenario. Our timelines were longer than 2027 when we published and now they are a bit longer still; "around 2030, lots of uncertainty though" is what I say these days.
@SamuelAlbanie
Samuel Albanie 🇬🇧
26 days
a data point for that ai 2027 graph
122
85
1K
@akbirkhan
akbir.
26 days
perfecting my late 2000s recession aesthetic with the xx new demos
0
0
4
@60Minutes
60 Minutes
29 days
In an extreme stress test, Antropic’s AI models resorted to blackmail to avoid being shut down. Research scientist Joshua Batson shows @andersoncooper how it happened and what they learned from it. https://t.co/oDjW5iHujd
60
326
1K
@boazbaraktcs
Boaz Barak
1 month
I disagree with the claim that the Anthropic report or cyber capabilities are “not a huge deal.” Models are getting stronger all the time and they will significantly reduce the skill level needed to carry out attacks which could have large impact. It is true that in the long
@sebkrier
Séb Krier
1 month
It's interesting that the idea of dangerous capabilities evaluations first originated in a context where much public commentary was anchored on stochastic parrots and "AI can't generate fingers, how could it ever be a threat beyond bias?" So it made a lot of sense to build toy
9
6
92
@idavidrein
david rein
1 month
I noticed a couple ants in my bathroom yesterday, so I bought some Terro, put it out this morning, and now I have ~500 ants in my bathroom :/
3
1
10