akbir. @akbirkhan X Profile

akbir.

@akbirkhan

Followers

2K

Following

11K

Media

295

Statuses

6K

researcher at @AnthropicAI

https://t.co/IhRnIiK4y3

Joined June 2011

Don't wanna be here? Send us removal request.

akbir.

@akbirkhan

6 months

here is my thesis “Safe Automated Research” i worked on 3 approaches to make sure we can trust the output of automated researchers as we reach this new era of science it was a very fun PhD

9

14

204

Sayash Kapoor

@sayashk

12 days

CORE-Bench is solved (using Opus 4.5 with Claude Code) TL;DR: Last week, we released results for Opus 4.5 on CORE-Bench, a benchmark that tests agents on scientific reproducibility tasks. Earlier this week, Nicholas Carlini reached out to share that an updated scaffold that uses

27

110

776

Saffron Huang

@saffronhuang

14 days

I’m so proud to have led this work, and really excited that it’s out now. We decided to study how Anthropic engineers/researchers’ jobs are changing because we thought: ok, AI is being used a lot in people’s jobs, and there’s a lot of speculation of what that might mean, but not

Anthropic

@AnthropicAI

14 days

How is AI changing work inside Anthropic? And what might this tell us about the effects on the wider labor force to come? We surveyed 132 of our engineers, conducted 53 in-depth interviews, and analyzed 200K internal Claude Code sessions to find out. https://t.co/YLLjs9W9e5

14

33

631

near

@nearcyan

14 days

https://t.co/CpumZR4LH3

Anthropic

@AnthropicAI

14 days

New on our Frontier Red Team blog: We tested whether AIs can exploit blockchain smart contracts. In simulated testing, AI agents found $4.6M in exploits. The research (with @MATSprogram and the Anthropic Fellows program) also developed a new benchmark:

12

32

2K

Robert Kirk

@_robertkirk

19 days

Misalignment will be an important driver of risk, so we're developing methods for red-teaming model behaviour. Very excited to publicly release a case study applying our alignment testing methodology to Claude Opus 4.5, Opus 4.1 and Sonnet 4.5 from @AnthropicAI! 🧵

3

9

44

Sayash Kapoor

@sayashk

19 days

We evaluated Gemini Pro 3 and Claude 4.5 Opus, Sonnet, and Haiku on CORE-Bench. - CORE-Bench consists of scientific reproduction tasks. Agents have to reproduce scientific papers using the code and data for the paper. - Opus 4.1 continues to have the highest accuracy on

8

15

142

rowan

@rowankwang

21 days

New Anthropic research: We build a diverse suite of dishonest models and use it to systematically test methods for improving honesty and detecting lies. Of the 25+ methods we tested, simple ones, like fine-tuning models to be honest despite deceptive instructions, worked best.

23

44

386

Trenton Bricken

@TrentonBricken

21 days

Always more to do but I'm proud of how safe Opus 4.5 is! (System Card section 6.2) https://t.co/ncvy5rIblk

5

10

127

Neil Houlsby

@neilhoulsby

22 days

Opus 4.5 is released today! In addition to being a big step forward in day-to-day work use-cases, I find the personality hits the right tone (very subjective of course, but suits my tastes). And if pre-training is your thing - the Zurich Team still has a spot or two left (see

Claude

@claudeai

22 days

Introducing Claude Opus 4.5: the best model in the world for coding, agents, and computer use. Opus 4.5 is a step forward in what AI systems can do, and a preview of larger changes to how work gets done.

3

2

89

Jesse Mu

@jayelmnop

21 days

There's never been a more exciting time to work on Code RL at A\—we're a small, close-knit team that works across pretraining, posttraining, safety, product, and more. I promise close to zero politics and a "you can just do things" attitude. And we're hiring! DMs are open :)

Claude

@claudeai

22 days

Introducing Claude Opus 4.5: the best model in the world for coding, agents, and computer use. Opus 4.5 is a step forward in what AI systems can do, and a preview of larger changes to how work gets done.

8

13

300

Agus 🔎🔸

@austinc3301

22 days

come on metr, do your thing

4

2

263

akbir.

@akbirkhan

21 days

📈📈📈

Nat McAleese

@__nmca__

22 days

the new vibe:

0

4

Sam Bowman

@sleepinyourhat

22 days

Opus 4.5 is a very good model, in nearly every sense we know how to measure. I’m also confident that it’s the model that we understand best as of its launch day: The system card includes 150 pages of research results, 50 of them on alignment.

Claude

@claudeai

22 days

Introducing Claude Opus 4.5: the best model in the world for coding, agents, and computer use. Opus 4.5 is a step forward in what AI systems can do, and a preview of larger changes to how work gets done.

6

29

464

akbir.

@akbirkhan

23 days

exciting results - also appreciate this makes most sense when you anthropomorphise models!

Anthropic

@AnthropicAI

25 days

New Anthropic research: Natural emergent misalignment from reward hacking in production RL. “Reward hacking” is where models learn to cheat on tasks they’re given during training. Our new study finds that the consequences of reward hacking, if unmitigated, can be very serious.

0

3

kate

@whoiskatrin

25 days

we have open roles, a sense of humour, and good taste in models

126

76

4K

Daniel Kokotajlo

@DKokotajlo

26 days

Yep! Things seem to be going somewhat slower than the AI 2027 scenario. Our timelines were longer than 2027 when we published and now they are a bit longer still; "around 2030, lots of uncertainty though" is what I say these days.

Samuel Albanie 🇬🇧

@SamuelAlbanie

26 days

a data point for that ai 2027 graph

122

85

1K

akbir.

@akbirkhan

26 days

perfecting my late 2000s recession aesthetic with the xx new demos

0

4

60 Minutes

@60Minutes

29 days

In an extreme stress test, Antropic’s AI models resorted to blackmail to avoid being shut down. Research scientist Joshua Batson shows @andersoncooper how it happened and what they learned from it. https://t.co/oDjW5iHujd

60

326

1K

Boaz Barak

@boazbaraktcs

1 month

I disagree with the claim that the Anthropic report or cyber capabilities are “not a huge deal.” Models are getting stronger all the time and they will significantly reduce the skill level needed to carry out attacks which could have large impact. It is true that in the long

Séb Krier

@sebkrier

1 month

It's interesting that the idea of dangerous capabilities evaluations first originated in a context where much public commentary was anchored on stochastic parrots and "AI can't generate fingers, how could it ever be a threat beyond bias?" So it made a lot of sense to build toy

9

6

92

david rein

@idavidrein

1 month

I noticed a couple ants in my bathroom yesterday, so I bought some Terro, put it out this morning, and now I have ~500 ants in my bathroom :/

3

1

10