Emmanuel Ameisen @mlpowered X Profile

Emmanuel Ameisen

@mlpowered

Followers

10K

Following

6K

Media

279

Statuses

2K

Interpretability/Finetuning @AnthropicAI Previously: Staff ML Engineer @stripe, Wrote BMLPA by @OReillyMedia, Head of AI at @InsightFellows, ML @Zipcar

https://t.co/EcDGmKAqZ2

San Francisco, CA

Joined June 2017

Don't wanna be here? Send us removal request.

Emmanuel Ameisen

@mlpowered

7 months

We've made progress in our quest to understand how Claude and models like it think! The paper has many fun and surprising case studies, that anyone who is interested in LLMs would enjoy. Check out the video below for an example

Anthropic

@AnthropicAI

7 months

New Anthropic research: Tracing the thoughts of a large language model. We built a "microscope" to inspect what happens inside AI models and use it to understand Claude’s (often complex and surprising) internal mechanisms.

4

8

121

Emmanuel Ameisen

@mlpowered

2 days

Striking result, which changed how I think about LLMs: When you change their activations, they can detect it and express what the change was. This indicates a deep awareness of their internal processing. LLMs can sometimes access their own thoughts

Anthropic

@AnthropicAI

3 days

New Anthropic research: Signs of introspection in LLMs. Can language models recognize their own internal thoughts? Or do they just make up plausible answers when asked about them? We found evidence for genuine—though limited—introspective capabilities in Claude.

20

17

202

Emmanuel Ameisen

@mlpowered

4 days

Cool paper using attribution graphs to automatically detect which reasoning steps contain mistakes! It seems like there are fundamental differences in the graphs for correct/incorrect steps. I'm really excited by methods to aggregate graphs, would love to see more such work.

Zheng Zhao

@zhengzhao97

8 days

Thrilled to share our latest research on verifying CoT reasonings, completed during my recent internship at FAIR @metaai. In this work, we introduce Circuit-based Reasoning Verification (CRV), a new white-box method to analyse and verify how LLMs reason, step-by-step.

3

13

135

Nathan Rotman

@naterotman

4 days

Rev. Green from Harlem in today's AMNY: "One of those solutions is sitting in the City Council right now to allow outerborough homeowners to short-term rent. In the grand scheme of things, homeowners are advocating for reforms under Intro. 948A that make little fixes to Local Law

amny.com

New York City is in the middle of an affordability crisis like one it has not seen in a generation. In times of challenge — be they financial, social or

1

8

Emmanuel Ameisen

@mlpowered

7 days

These results are fun, and they also point at how general representations inside the model can be! Features for eyes activate in text, ascii art, SVGs. Features for emotions affect drawings!

julius tarng cyber inspector

@tarngerine

8 days

What happens when you turn a designer into an interpretability researcher? They spend hours staring at feature activations in SVG code to see if LLMs actually understand SVGs. It turns out – yes~ We found that semantic concepts transfer across text, ASCII, and SVG:

0

1

21

snav

@qorprate

10 days

THIS IS SO FREAKING COOL LLMS CAN LITERALLY SEE BECAUSE TEXT HAS SPATIAL QUALITIES THAT'S HOW THEY MAKE ASCII ART

Wes Gurnee

@wesg52

11 days

New paper! We reverse engineered the mechanisms underlying Claude Haiku’s ability to perform a simple “perceptual” task. We discover beautiful feature families and manifolds, clean geometric transformations, and distributed attention algorithms!

58

229

4K

Neel Nanda

@NeelNanda5

11 days

It's lovely to see an old school, deep dive mech interp analysis on a real model! (Claude 3.5 Haiku). This is both much more convoluted and more comprehensible than I expected! And so pretty This seems the most complex behaviour yet understood at real depth, nice work!

Wes Gurnee

@wesg52

11 days

New paper! We reverse engineered the mechanisms underlying Claude Haiku’s ability to perform a simple “perceptual” task. We discover beautiful feature families and manifolds, clean geometric transformations, and distributed attention algorithms!

5

19

364

Capital Research Center

@capitalresearch

4 days

AARP calls itself a nonprofit—but after a $9 billion payout from UnitedHealth, whose interests is it really serving?

10

19

79

Isaac Kauvar

@ikauvar

11 days

What mechanisms do LLMs use to perceive their world? An exciting effort led by @wesg52 @mlpowered reveals beautiful structure in how Claude Haiku implements a fundamental "perceptual" task for an LLM: deciding when to start a new line of text.

Wes Gurnee

@wesg52

11 days

New paper! We reverse engineered the mechanisms underlying Claude Haiku’s ability to perform a simple “perceptual” task. We discover beautiful feature families and manifolds, clean geometric transformations, and distributed attention algorithms!

2

3

14

Wes Gurnee

@wesg52

11 days

New paper! We reverse engineered the mechanisms underlying Claude Haiku’s ability to perform a simple “perceptual” task. We discover beautiful feature families and manifolds, clean geometric transformations, and distributed attention algorithms!

44

309

2K

Emmanuel Ameisen

@mlpowered

11 days

This makes me hopeful about interpretability. We found the features, traced the algorithm, and understood the mechanism. Next: making this easier and automated. Full paper: https://t.co/xz3OcMy00J

6

3

104

Emmanuel Ameisen

@mlpowered

11 days

We think these findings could apply to other counting tasks because: When we train a toy model to pack counts optimally, it discovers a similar structure Empirically, many tasks seem to be using similar structure (table rows, dates...)

1

52

Cytronic

@cytronic_ai

2 months

Welcome to Cytronic a robot-powered 3PL. No errors. No sick days. Just precision. See how much your business can save on fulfillment costs today.

0

6

33

Emmanuel Ameisen

@mlpowered

11 days

And how does the model make these helices in the first place? To get enough curvature, it needs to sum up the results of many attention heads. We find that it uses 11 heads spread across 2 layers. Each head handles a subset of the line, and writes in a different direction.

1

3

87

Emmanuel Ameisen

@mlpowered

11 days

The model does this multiple times with different offsets - like using two eyes or multiple cameras to get depth perception. Combining three offsets gives a precise estimate of characters left - sharp enough to decide if a 5-letter word fits in 3 remaining characters.

1

36

Emmanuel Ameisen

@mlpowered

11 days

Remember, the model needs to compare its current position to the line limit. Both are on helices. The solution? Rotate one helix by a fixed offset, then measure how aligned they are. When they match → you're that many characters from the limit.

1

0

36

Emmanuel Ameisen

@mlpowered

11 days

Looking at the geometry of these features, we discover clear structure: the model doesn't use independent directions for each position range. Instead, it is representing each potential position on a smooth 6D helix through embedding space.

1

47

Emmanuel Ameisen

@mlpowered

11 days

Looking at more prompts, we find a family of features (directions in embedding space) representing position in the line. Each activates at a different position. These resemble "place cells" - neurons in mouse brains that fire at specific locations when navigating space.

2

43

Emmanuel Ameisen

@mlpowered

11 days

When we trace the computation, we find the model tracking two things: where it is in the current line, and how long the previous line was. Then it compares them to decide if the next word fits. But how does it keep track of its position?

1

0

33

Emmanuel Ameisen

@mlpowered

11 days

The task we study is knowing when to break the line in fixed-width text. We chose it for two reasons: While unconscious for humans (you just see when you're out of room), models don't have eyes - they only see tokens It is so common that models like Claude are very good at it

1

0

34

Emmanuel Ameisen

@mlpowered

11 days

How does an LLM compare two numbers? We studied this in a common counting task, and were surprised to learn that the algorithm it used was: Put each number on a helix, and then twist one helix to compare it to the other. Not your first guess? Not ours either. 🧵

12

70

457

Simon Willison

@simonw

16 days

Claude Skills are awesome, maybe a bigger deal than MCP https://t.co/1wIYcTFrzI

simonwillison.net

Anthropic this morning introduced Claude Skills, a new pattern for making new abilities available to their models: Claude can now use Skills to improve how it performs specific tasks. Skills …

112

265

3K

davidad 🎇

@davidad

16 days

tired: give an LLM a skill by fine-tuning it wired: give an LLM a skill by putting some files on its computer about how to be good at the skill, which it can then read at its convenience

Ado

@adocomplete

16 days

We're launching Claude Agent Skills, a filesystem-based approach to extending Claude's capabilities. Progressive disclosure means agents load only relevant context. Bundle instructions, scripts, and resources in a folder. Claude discovers and executes what it needs.

31

93

2K

Emmanuel Ameisen

@mlpowered

17 days

It is fast!

swyx

@swyx

17 days

"more than twice the speed" is underselling Haiku tbh built a way to directly compare Sonnet v Haiku 4.5 and it's roughly 3.5x faster, but the UX feels SO much better because Haiku stays inside the "flow window". obviously end to end latency varies a lot so Ant can't report a

1

0

4