#CrossCoder X Hashtag

Explore tweets tagged as #CrossCoder

Lee Sharkey

@leedsharkey

7 months

And the method lets us identify computations that are spread across multiple layers. This has been conceptually challenging for the SAE paradigm to overcome. (Crosscoder features aren't the computations themselves, but are more akin to the results of the computations).

1

26

Clément Dumas

@Butanium_

4 months

Our analysis confirms these aren't just theoretical concerns! Looking at L1 crosscoder, we found:.- Many "chat-only" latents (blue) show high Shrinkage values.- Clear overlap between chat-only and shared latents (orange). Most "chat-only" latents aren't actually chat-specific!

1

5

Clément Dumas

@Butanium_

4 months

We identified two theoretical issues with L1 crosscoders:.1️⃣ Complete Shrinkage: L1 regularization might force base latents to zero even when useful.2️⃣ Latent Decoupling: "Chat-only" concepts might actually exist in the base model but be encoded in the crosscoder differently

1

0

10

Tibor Blaho

@btibor91

6 months

Anthropic researchers published new insights on "Crosscoder Model Diffing", showing that model-exclusive features tend to be harder to interpret due to competition for feature space, and proposing a method to make them more understandable

2

1

58

Barry Webster

@BazzaBulldog

6 years

@CrossCodersCo @lozsparky11 - here’s some feedback from a very happy mum who’s daughter attended the first of two CrossCoder training sessions.

1

jack morris

@jxmnop

2 months

just learned about "model diffing" from Anthropic. buried in an october blogpost; feels really novel. training a 'crosscoder' between two models of the same family produces interpretable diffs. here post-training clearly adds refusals, QA, math, etc. pretty amazing stuff

11

32

723

Adam Karvonen

@a_karvonen

4 months

Very cool paper. They make a compelling case against the typical crosscoder per-model norm loss and show a simpler method (BatchTopK) gets better results. I really liked this figure, which shows why the crosscoder loss leads to an illusion of many model specific features.

Clément Dumas

@Butanium_

4 months

New paper w/@jkminder & @NeelNanda5! What do chat LLMs learn in finetuning?. Anthropic introduced a tool for this: crosscoders, an SAE variant. We find key limitations of crosscoders & fix them with BatchTopK crosscoders. This finds interpretable and causal chat-only features! 🧵

1

4

34

Connor Kissane

@Connor_Kissane

10 months

Open source replication of @AnthropicAI's Crosscoders paper (@Jack_W_Lindsey et al) for model-diffing!. We trained a crosscoder to model-diff the middle layer residual stream of Gemma-2 2B base and IT. The results hold up: we find shared, base-specific, and IT-specific latents.

4

15

137

Tibor Blaho

@btibor91

6 months

0

15

ハカセアイ(Ai-Hakase)🐾最新トレンドＡＩのためのＸ 🐾

@ai_hakase_

6 months

【Crosscoder Model Diffingの洞察】.✎. FYIG: Anthropicの研究者の方々が、"Crosscoder Model Diffing"に関する新たな知見を発表されたそうです！✨.

0

2

GhostsSoup

@GhostsSoup

4 years

CrossCoder helped out with fixing the textures. Just gotta fix up the rig then edit the UVs and we should be good to go! Gonna also make an Overworld version too!

0

3

14

𝚐𝔪𝟾𝚡𝚡𝟾

@gm8xx8

4 months

Crosscoder-based model diffing tracks concept shifts in LLM fine-tuning. Standard L1 loss misattributes shared latents as fine-tuned-specific. Latent Scaling detects these errors; BatchTopK loss improves separation. On Gemma 2B, the method isolates interpretable, chat-specific

1

0

6

Coding Materials

@CodingMaterials

7 years

Save time, eliminate confusion, and improve coding #proficiency! #Book your copy of #Procedural #CrossCoder-2019 today and avail upto 30% discount.

0

1

Coding Materials

@CodingMaterials

6 years

Couldn't buy Procedural #CrossCoder - eBook earlier?? Here is a chance for you to buy one at for $179.95 ONLY!! Buy any Coding Book, #Ebook or any other resource and avail 15% discount!. Order Today:

0

2

ハカセアイ(Ai-Hakase)🐾最新トレンドＡＩのためのＸ 🐾

@ai_hakase_

6 months

【Crosscoder Model Diffingの深層に迫る！】.Crosscoder Model Diffingに関する新たな考察が発表されました！✨. Siddharth Mishra-Sharma氏らによる研究で、Crosscoder Model Diffingにおいて、一方のモデルに特有のfeatureが多義的で解釈が難しい…😵という現象を調査したそうです。