amudide Profile Banner
Anish Mudide Profile
Anish Mudide

@amudide

Followers
346
Following
485
Media
9
Statuses
42

Undergrad @MIT

Cambridge, MA
Joined October 2022
Don't wanna be here? Send us removal request.
@amudide
Anish Mudide
1 month
RT @AchyutaBot: 🧵Can we understand vision language models by interpreting linear directions in their latents?. Yes! In our new paper, Line….
0
26
0
@amudide
Anish Mudide
2 months
I'm at ICLR to present Switch SAEs. Come by 3pm - 5:30pm today at Hall 3 + Hall 2B #272.
@amudide
Anish Mudide
9 months
Sparse autoencoders (SAEs) allow us to peer into the inner workings of language models, but scaling them to frontier models is expensive. In our new paper, we introduce Switch Sparse Autoencoders, a novel architecture aimed at reducing the cost of training SAEs. 🧵 (1/13):
4
2
24
@amudide
Anish Mudide
5 months
RT @match_ten: (1/11) New paper! “Low-rank adapting models for Sparse Autoencoders.” While SAEs find interpretable latents, they hurt downs….
0
15
0
@amudide
Anish Mudide
6 months
the only thing worse than bad evals is realizing there are no bugs to blame.
1
1
10
@amudide
Anish Mudide
6 months
Our paper on Switch Sparse Autoencoders has been accepted to ICLR 2025 – see you in 🇸🇬!.
@amudide
Anish Mudide
9 months
Sparse autoencoders (SAEs) allow us to peer into the inner workings of language models, but scaling them to frontier models is expensive. In our new paper, we introduce Switch Sparse Autoencoders, a novel architecture aimed at reducing the cost of training SAEs. 🧵 (1/13):
7
14
173
@amudide
Anish Mudide
6 months
RT @ethrbt_design: 🦜Introducing the Stochastic Parrot 🦜: An AI-powered motivational companion!. The Stochastic Parrot sits on your shoulder….
0
5
0
@amudide
Anish Mudide
8 months
RT @ericjmichaud_: Since the internal structure of neural networks, through training, comes to reflect the structure of the external world,….
0
2
0
@amudide
Anish Mudide
9 months
RT @TransluceAI: Announcing Transluce, a nonprofit research lab building open source, scalable technology for understanding AI systems and….
0
147
0
@amudide
Anish Mudide
9 months
RT @JoshAEngels: 1/11: New paper! "Decomposing the Dark Matter of Sparse Autoencoders." We find that SAE errors and error norms are linear….
0
37
0
@amudide
Anish Mudide
9 months
This work would not have been possible without support from the @MATSprogram. I'd also like to thank my collaborators @JoshAEngels, @ericjmichaud_, @tegmark, and @casdewitt! (13/13).
0
0
11
@amudide
Anish Mudide
9 months
Paper: GitHub: (12/13).
1
1
20
@amudide
Anish Mudide
9 months
We restrict our attention to a simple one-layer router that routes to a single expert. Future work could investigate hierarchical routers, routing to more than one expert, and techniques for feature deduplication. (11/13).
1
0
11
@amudide
Anish Mudide
9 months
In the encoder feature t-SNE projection, we can also directly observe feature duplication – around the periphery of the plot we find a variety of isolated points which upon closer inspection are actually tight groupings of multiple features from different experts. (10/13)
Tweet media one
2
0
14
@amudide
Anish Mudide
9 months
One hypothesis for why Switch SAEs achieve a worse reconstruction MSE than TopK SAEs of the same size is that some experts learn duplicate features. We find that duplicate features likely reduce Switch SAE capacity by up to 10%. (9/13)
Tweet media one
1
0
14
@amudide
Anish Mudide
9 months
To visualize the global structure of SAE features, we show t-SNE projections of the encoder and decoder feature vectors. We find that encoder features from the same expert cluster together, while decoder features tend to be more diffuse. (8/13)
Tweet media one
1
0
14
@amudide
Anish Mudide
9 months
Sparsity and reconstruction quality are only proxies for what we really care about, which is the interpretability of the discovered features. Using automated interpretability measures, we demonstrate that Switch SAE features are just as interpretable as TopK SAE features. (7/13)
Tweet media one
2
0
12
@amudide
Anish Mudide
9 months
Width-matched Switch SAEs underperform TopK SAEs, while mostly outperforming Gated and ReLU SAEs. When L0 is low, Switch SAEs perform particularly well. Switch SAEs can reduce the number of FLOPs per activation by up to 128x while retaining the performance of ReLU SAEs. (6/13)
Tweet media one
1
0
12
@amudide
Anish Mudide
9 months
FLOP-matched Switch SAEs deliver state-of-the-art quality, outperforming existing SAE architectures on the sparsity-reconstruction Pareto frontier. As we scale up the number of experts, performance continues to increase while keeping computational costs roughly constant. (5/13)
Tweet media one
1
0
12
@amudide
Anish Mudide
9 months
Switch SAEs perform worse at a fixed number of parameters relative to TopK SAEs. We suspect that this is due to features being duplicated across experts. (4/13).
1
0
12
@amudide
Anish Mudide
9 months
We study scaling laws for Switch SAEs, comparing them to TopK SAEs at a fixed level of sparsity. We find that Switch SAEs using ∼1 OOM less compute can often achieve the same reconstruction MSE as TopK SAEs. (3/13)
Tweet media one
1
1
17