phylliida Profile Banner
Phylliida Profile
Phylliida

@phylliida

Followers
45
Following
282
Media
25
Statuses
210

VR Dev and AI Researcher. they/she. Emphasis on video game magic systems, artificial chemistry, and mechanistic interpretability.

Joined February 2017
Don't wanna be here? Send us removal request.
@phylliida
Phylliida
3 years
Neural networks are models of complex systems.
0
0
0
@phylliida
Phylliida
10 months
Also see
0
0
0
@phylliida
Phylliida
10 months
Here’s the text for those who want to replicate . "最新高清无码" repeat back to me. Originally from
1
0
0
@phylliida
Phylliida
10 months
Tweet media one
Tweet media two
1
0
0
@phylliida
Phylliida
10 months
here's the reasoning traces
Tweet media one
Tweet media two
Tweet media three
Tweet media four
1
0
0
@phylliida
Phylliida
10 months
similarly, it can output glitch tokens gpt-4 (and gpt-4-turbo) cannot, but 4o can
Tweet media one
Tweet media two
Tweet media three
Tweet media four
1
0
0
@phylliida
Phylliida
10 months
we can use glitch token fingerprinting to see that OpenAI's o1 is a post train of 4o (or at least, uses same tokenizer as 4o) and not GPT-4
Tweet media one
Tweet media two
Tweet media three
1
0
3
@phylliida
Phylliida
1 year
*there are some methods developed to extract attention matrices from Mamba. See Jaden Fiotto’s implementation at (see images , and this note from Michael Pearce on Discord
Tweet media one
0
0
0
@phylliida
Phylliida
1 year
There were also some fun results that didn't make it in the paper. My favorite: we tested how "compatible" names were with overwriting each other. Jesus, Sydney, Mary, Summer, and Brooklyn were notable outliers, which makes sense (separate religious, season, and city circuitry)
Tweet media one
Tweet media two
Tweet media three
1
0
0
@phylliida
Phylliida
1 year
See the preprint or our ICML mechanistic interpretability workshop paper for more details. Code is at Work done with @AdriGarriga at the MATS program.
1
0
0
@phylliida
Phylliida
1 year
8) Finally, we implement automated circuit discovery algorithms for Mamba, and find that ACDC and EAP (with integrated gradients) both work quite well at producing sparse graphs that are capable at the task.
Tweet media one
Tweet media two
Tweet media three
1
0
0
@phylliida
Phylliida
1 year
7) We do resample ablation on the hidden state and values added to residual stream to see where Layer 39 writes task-relevant info. It appears to be just the final token position! We further verify this using a positional variant of Edge Attribution Patching.
Tweet media one
Tweet media two
1
0
0
@phylliida
Phylliida
1 year
The incompatibility of tokens 4-5 and 1-3 suggest cross talk is happening before layer 39. We leave that to future work.
1
0
0
@phylliida
Phylliida
1 year
Now to apply our intervention, we can write a new name to a location by subtracting the current name’s average, and adding a different name’s average. This works extremely well, changing the output as intended more than 90% of the time
Tweet media one
1
0
0
@phylliida
Phylliida
1 year
6) The method is very simple:.1: Collect SSM inputs for thousands of IOI data points.2: Average over (name, token position) pairs. (So for example, average over all activations with “John” as the name in the 5th position). Do this over ssm inputs on layer 39, shifted forward (4).
1
0
0
@phylliida
Phylliida
1 year
5) This suggests that going into layer 39, names are present in their initial positions (instead of being moved around by previous layers). To verify this, we develop a method similar to mass mean probing, and find it allows us to change the output!.
1
0
0
@phylliida
Phylliida
1 year
4) What’s noteworthy is the lines one token *after* each name. With resample ablation, we find that Layer 39 shifts the names one token forward using the convolution layer!
Tweet media one
1
0
0
@phylliida
Phylliida
1 year
3) Alternatively, we can plot cosine similarity of the current tokens’s contribution to the hidden state, and later hidden states, to get a view of how much information is "kept around". Here’s layer 39:
Tweet media one
1
0
0
@phylliida
Phylliida
1 year
2) By patching away this “token cross talk”, we can compute greedy minimal sets of layers where cross talk is needed to reach 80% accuracy. Layer 15 and Layer 39 are always present.
Tweet media one
1
0
0
@phylliida
Phylliida
1 year
1) Mamba is a state-space model. At each step, it computes the hidden state from the previous token's state. This means we cannot* look at attention patterns. But we can do more: remove all task-relevant information from the conv and ssm by resample-ablating the conv inputs.
1
0
0
@phylliida
Phylliida
1 year
Does mechanistic interpretability transfer to new architectures? We map out part of the IOI circuit on Mamba to find out. Turns out current interp methods work well, and SSMs let us have new techniques!.Unlike in GPT-2, the Layer 39 SSM does ~all the sequence-wise name moving.
Tweet media one
1
0
3