Anish Mudide @amudide X Profile

Anish Mudide

@amudide

Followers

346

Following

485

Media

9

Statuses

42

Undergrad @MIT

Cambridge, MA

Joined October 2022

Don't wanna be here? Send us removal request.

Anish Mudide

@amudide

1 month

RT @AchyutaBot: 🧵Can we understand vision language models by interpreting linear directions in their latents?. Yes! In our new paper, Line….

0

26

0

Anish Mudide

@amudide

2 months

I'm at ICLR to present Switch SAEs. Come by 3pm - 5:30pm today at Hall 3 + Hall 2B #272.

Anish Mudide

@amudide

9 months

Sparse autoencoders (SAEs) allow us to peer into the inner workings of language models, but scaling them to frontier models is expensive. In our new paper, we introduce Switch Sparse Autoencoders, a novel architecture aimed at reducing the cost of training SAEs. 🧵 (1/13):

4

2

24

Anish Mudide

@amudide

5 months

RT @match_ten: (1/11) New paper! “Low-rank adapting models for Sparse Autoencoders.” While SAEs find interpretable latents, they hurt downs….

0

15

0

Anish Mudide

@amudide

6 months

the only thing worse than bad evals is realizing there are no bugs to blame.

1

10

Anish Mudide

@amudide

6 months

Our paper on Switch Sparse Autoencoders has been accepted to ICLR 2025 – see you in 🇸🇬!.

Anish Mudide

@amudide

9 months

Sparse autoencoders (SAEs) allow us to peer into the inner workings of language models, but scaling them to frontier models is expensive. In our new paper, we introduce Switch Sparse Autoencoders, a novel architecture aimed at reducing the cost of training SAEs. 🧵 (1/13):

7

14

173

Anish Mudide

@amudide

6 months

RT @ethrbt_design: 🦜Introducing the Stochastic Parrot 🦜: An AI-powered motivational companion!. The Stochastic Parrot sits on your shoulder….

0

5

0

Anish Mudide

@amudide

8 months

RT @ericjmichaud_: Since the internal structure of neural networks, through training, comes to reflect the structure of the external world,….

0

2

0

Anish Mudide

@amudide

9 months

RT @TransluceAI: Announcing Transluce, a nonprofit research lab building open source, scalable technology for understanding AI systems and….

0

147

0

Anish Mudide

@amudide

9 months

RT @JoshAEngels: 1/11: New paper! "Decomposing the Dark Matter of Sparse Autoencoders." We find that SAE errors and error norms are linear….

0

37

0

Anish Mudide

@amudide

9 months

This work would not have been possible without support from the @MATSprogram. I'd also like to thank my collaborators @JoshAEngels, @ericjmichaud_, @tegmark, and @casdewitt! (13/13).

0

11

Anish Mudide

@amudide

9 months

Paper: GitHub: (12/13).

1

20

Anish Mudide

@amudide

9 months

We restrict our attention to a simple one-layer router that routes to a single expert. Future work could investigate hierarchical routers, routing to more than one expert, and techniques for feature deduplication. (11/13).

1

0

11

Anish Mudide

@amudide

9 months

In the encoder feature t-SNE projection, we can also directly observe feature duplication – around the periphery of the plot we find a variety of isolated points which upon closer inspection are actually tight groupings of multiple features from different experts. (10/13)

2

0

14

Anish Mudide

@amudide

9 months

One hypothesis for why Switch SAEs achieve a worse reconstruction MSE than TopK SAEs of the same size is that some experts learn duplicate features. We find that duplicate features likely reduce Switch SAE capacity by up to 10%. (9/13)

1

0

14

Anish Mudide

@amudide

9 months

To visualize the global structure of SAE features, we show t-SNE projections of the encoder and decoder feature vectors. We find that encoder features from the same expert cluster together, while decoder features tend to be more diffuse. (8/13)

1

0

14

Anish Mudide

@amudide

9 months

Sparsity and reconstruction quality are only proxies for what we really care about, which is the interpretability of the discovered features. Using automated interpretability measures, we demonstrate that Switch SAE features are just as interpretable as TopK SAE features. (7/13)

2

0

12

Anish Mudide

@amudide

9 months

Width-matched Switch SAEs underperform TopK SAEs, while mostly outperforming Gated and ReLU SAEs. When L0 is low, Switch SAEs perform particularly well. Switch SAEs can reduce the number of FLOPs per activation by up to 128x while retaining the performance of ReLU SAEs. (6/13)

1

0

12

Anish Mudide

@amudide

9 months

FLOP-matched Switch SAEs deliver state-of-the-art quality, outperforming existing SAE architectures on the sparsity-reconstruction Pareto frontier. As we scale up the number of experts, performance continues to increase while keeping computational costs roughly constant. (5/13)

1

0

12

Anish Mudide

@amudide

9 months

Switch SAEs perform worse at a fixed number of parameters relative to TopK SAEs. We suspect that this is due to features being duplicated across experts. (4/13).

1

0

12

Anish Mudide

@amudide

9 months

We study scaling laws for Switch SAEs, comparing them to TopK SAEs at a fixed level of sparsity. We find that Switch SAEs using ∼1 OOM less compute can often achieve the same reconstruction MSE as TopK SAEs. (3/13)

1

17