Shruti Joshi @_shruti_joshi_ X Profile

Shruti Joshi

@_shruti_joshi_

Followers

405

Following

2K

Media

3

Statuses

179

phd student in identifiable repl @Mila_Quebec. prev. research programmer @MPI_IS Tübingen, undergrad @IITKanpur '19.

Montreal, Canada

Joined August 2018

Don't wanna be here? Send us removal request.

Shruti Joshi

@_shruti_joshi_

6 months

1\ Hi, can I get an unsupervised sparse autoencoder for steering, please? I only have unlabeled data varying across multiple unknown concepts. Oh, and make sure it learns the same features each time!. Yes! A freshly brewed Sparse Shift Autoencoder (SSAE) coming right up. 🧶

1

10

46

Shruti Joshi

@_shruti_joshi_

1 month

^on Saturday, 19th July.

0

Shruti Joshi

@_shruti_joshi_

1 month

I will be at the Actionable Interpretability Workshop (@ActInterp, #ICML) presenting *SSAEs* in the East Ballroom A from 1-2pm. Drop by (or send a DM) to chat about (actionable) interpretability, (actionable) identifiability, and everything in between!.

Shruti Joshi

@_shruti_joshi_

6 months

1\ Hi, can I get an unsupervised sparse autoencoder for steering, please? I only have unlabeled data varying across multiple unknown concepts. Oh, and make sure it learns the same features each time!. Yes! A freshly brewed Sparse Shift Autoencoder (SSAE) coming right up. 🧶

1

6

24

Shruti Joshi

@_shruti_joshi_

3 months

RT @Sahil1V: 🚨 New Paper! 🚨.Guard models slow, language-specific, and modality-limited?. Meet OmniGuard that detects harmful prompts across….

0

39

0

Shruti Joshi

@_shruti_joshi_

5 months

RT @soumyesinghal: ⚡⚡ Llama-Nemotron-Ultra-253B just dropped: our most advanced open reasoning model.🧵👇

0

13

0

Shruti Joshi

@_shruti_joshi_

5 months

RT @arkil_patel: 𝐓𝐡𝐨𝐮𝐠𝐡𝐭𝐨𝐥𝐨𝐠𝐲 paper is out! 🔥🐋. We study the reasoning chains of DeepSeek-R1 across a variety of tasks and settings and fin….

0

5

0

Shruti Joshi

@_shruti_joshi_

6 months

RT @arkil_patel: Presenting ✨ 𝐂𝐇𝐀𝐒𝐄: 𝐆𝐞𝐧𝐞𝐫𝐚𝐭𝐢𝐧𝐠 𝐜𝐡𝐚𝐥𝐥𝐞𝐧𝐠𝐢𝐧𝐠 𝐬𝐲𝐧𝐭𝐡𝐞𝐭𝐢𝐜 𝐝𝐚𝐭𝐚 𝐟𝐨𝐫 𝐞𝐯𝐚𝐥𝐮𝐚𝐭𝐢𝐨𝐧 ✨. Work w/ fantastic advisors @DBahdanau and @siv….

0

18

0

Shruti Joshi

@_shruti_joshi_

6 months

Curious to find out more? Check out our pre-print at: Work done with an amazing set of researchers: @andrea_dittadi, @seblachap, and @dhanya_sridhar!.

arxiv.org

Steering methods manipulate the representations of large language models (LLMs) to induce responses that have desired properties, e.g., truthfulness, offering a promising approach for LLM...

0

6

Shruti Joshi

@_shruti_joshi_

6 months

5\ So, does it actually work? We show that SSAE accurately steers embeddings on both semi-synthetic and real-world datasets (like TruthfulQA) using Llama-3.1-8B, handling in- and out-of-distribution data with ease.

1

0

1

Shruti Joshi

@_shruti_joshi_

6 months

4\ What does this mean for steering? You get access to steering vectors for individual concepts, such that each vector consistently steers only a single concept, and can be scaled according to the context.

1

0

1

Shruti Joshi

@_shruti_joshi_

6 months

3\ With sufficiently diverse data (such as in the real world), SSAEs remain identifiable up to permutation and scaling— repeated runs yield consistent representations, differing only by trivial indeterminacies.

1

0

3

Shruti Joshi

@_shruti_joshi_

6 months

2\ The SSAE is designed to map the difference between text embeddings (varying across multiple unknown concepts) to a sparse representation. Unlike standard SAEs, which impose sparsity on the concept representations themselves, we focus on sparsity of the shifts to the concepts.

1

0

2

Shruti Joshi

@_shruti_joshi_

10 months

RT @Sahil1V: 📣 📣 📣 Our new paper investigates the question of how many images 🖼️ of a concept are required by a diffusion model 🤖 to imitat….

0

61

0

Shruti Joshi

@_shruti_joshi_

10 months

RT @Tom__Marty: 🚨NEW PAPER OUT 🚨 . Excited to share our latest research initiative on in-context learning and meta-learning through the len….

arxiv.org

A central goal of machine learning is generalization. While the No Free Lunch Theorem states that we cannot obtain theoretical guarantees for generalization without further assumptions, in...

0

3

0

Shruti Joshi

@_shruti_joshi_

11 months

RT @leenaCvankadara: I am thrilled to announce that I will be joining the Gatsby Computational Neuroscience Unit at UCL as a Lecturer (Assi….

0

6

0

Shruti Joshi

@_shruti_joshi_

1 year

RT @arkil_patel: Presenting tomorrow at #NAACL2024:. 𝐶𝑎𝑛 𝐿𝐿𝑀𝑠 𝑖𝑛-𝑐𝑜𝑛𝑡𝑒𝑥𝑡 𝑙𝑒𝑎𝑟𝑛 𝑡𝑜 𝑢𝑠𝑒 𝑛𝑒𝑤 𝑝𝑟𝑜𝑔𝑟𝑎𝑚𝑚𝑖𝑛𝑔 𝑙𝑖𝑏𝑟𝑎𝑟𝑖𝑒𝑠 𝑎𝑛𝑑 𝑙𝑎𝑛𝑔𝑢𝑎𝑔𝑒𝑠?. 𝑌𝑒𝑠. 𝐾𝑖𝑛𝑑 𝑜….

0

20

0

Shruti Joshi

@_shruti_joshi_

1 year

RT @ncmeade: Adversarial Triggers For LLMs Are 𝗡𝗢𝗧 𝗨𝗻𝗶𝘃𝗲𝗿𝘀𝗮𝗹!😲. It is believed that adversarial triggers that jailbreak a model transfer un….

0

32

0

Shruti Joshi

@_shruti_joshi_

1 year

RT @arkil_patel: 📢 Exciting new work on AI safety!. Do adversarial triggers transfer universally across models (as has been claimed)?.𝗡𝗼.….

0

6

0

Shruti Joshi

@_shruti_joshi_

2 years

RT @arkil_patel: Presenting tomorrow at #EMNLP2023:. MAGNIFICo: Evaluating the In-Context Learning Ability of Large Language Models to Gene….

0

17

0

Shruti Joshi

@_shruti_joshi_

2 years

RT @seblachap: 1/ Excited for our oral presentation at #NeurIPS2023 on "Additive Decoders for Latent Variables Identification and Cartesian….

0

27

0