_shruti_joshi_ Profile Banner
Shruti Joshi Profile
Shruti Joshi

@_shruti_joshi_

Followers
405
Following
2K
Media
3
Statuses
179

phd student in identifiable repl @Mila_Quebec. prev. research programmer @MPI_IS Tรผbingen, undergrad @IITKanpur '19.

Montreal, Canada
Joined August 2018
Don't wanna be here? Send us removal request.
@_shruti_joshi_
Shruti Joshi
6 months
1\ Hi, can I get an unsupervised sparse autoencoder for steering, please? I only have unlabeled data varying across multiple unknown concepts. Oh, and make sure it learns the same features each time!. Yes! A freshly brewed Sparse Shift Autoencoder (SSAE) coming right up. ๐Ÿงถ
Tweet media one
1
10
46
@_shruti_joshi_
Shruti Joshi
1 month
^on Saturday, 19th July.
0
0
0
@_shruti_joshi_
Shruti Joshi
1 month
I will be at the Actionable Interpretability Workshop (@ActInterp, #ICML) presenting *SSAEs* in the East Ballroom A from 1-2pm. Drop by (or send a DM) to chat about (actionable) interpretability, (actionable) identifiability, and everything in between!.
@_shruti_joshi_
Shruti Joshi
6 months
1\ Hi, can I get an unsupervised sparse autoencoder for steering, please? I only have unlabeled data varying across multiple unknown concepts. Oh, and make sure it learns the same features each time!. Yes! A freshly brewed Sparse Shift Autoencoder (SSAE) coming right up. ๐Ÿงถ
Tweet media one
1
6
24
@_shruti_joshi_
Shruti Joshi
3 months
RT @Sahil1V: ๐Ÿšจ New Paper! ๐Ÿšจ.Guard models slow, language-specific, and modality-limited?. Meet OmniGuard that detects harmful prompts acrossโ€ฆ.
0
39
0
@_shruti_joshi_
Shruti Joshi
5 months
RT @soumyesinghal: โšกโšก Llama-Nemotron-Ultra-253B just dropped: our most advanced open reasoning model.๐Ÿงต๐Ÿ‘‡
Tweet media one
0
13
0
@_shruti_joshi_
Shruti Joshi
5 months
RT @arkil_patel: ๐“๐ก๐จ๐ฎ๐ ๐ก๐ญ๐จ๐ฅ๐จ๐ ๐ฒ paper is out! ๐Ÿ”ฅ๐Ÿ‹. We study the reasoning chains of DeepSeek-R1 across a variety of tasks and settings and finโ€ฆ.
0
5
0
@_shruti_joshi_
Shruti Joshi
6 months
RT @arkil_patel: Presenting โœจ ๐‚๐‡๐€๐’๐„: ๐†๐ž๐ง๐ž๐ซ๐š๐ญ๐ข๐ง๐  ๐œ๐ก๐š๐ฅ๐ฅ๐ž๐ง๐ ๐ข๐ง๐  ๐ฌ๐ฒ๐ง๐ญ๐ก๐ž๐ญ๐ข๐œ ๐๐š๐ญ๐š ๐Ÿ๐จ๐ซ ๐ž๐ฏ๐š๐ฅ๐ฎ๐š๐ญ๐ข๐จ๐ง โœจ. Work w/ fantastic advisors @DBahdanau and @sivโ€ฆ.
0
18
0
@_shruti_joshi_
Shruti Joshi
6 months
5\ So, does it actually work? We show that SSAE accurately steers embeddings on both semi-synthetic and real-world datasets (like TruthfulQA) using Llama-3.1-8B, handling in- and out-of-distribution data with ease.
Tweet media one
1
0
1
@_shruti_joshi_
Shruti Joshi
6 months
4\ What does this mean for steering? You get access to steering vectors for individual concepts, such that each vector consistently steers only a single concept, and can be scaled according to the context.
1
0
1
@_shruti_joshi_
Shruti Joshi
6 months
3\ With sufficiently diverse data (such as in the real world), SSAEs remain identifiable up to permutation and scalingโ€” repeated runs yield consistent representations, differing only by trivial indeterminacies.
Tweet media one
1
0
3
@_shruti_joshi_
Shruti Joshi
6 months
2\ The SSAE is designed to map the difference between text embeddings (varying across multiple unknown concepts) to a sparse representation. Unlike standard SAEs, which impose sparsity on the concept representations themselves, we focus on sparsity of the shifts to the concepts.
1
0
2
@_shruti_joshi_
Shruti Joshi
10 months
RT @Sahil1V: ๐Ÿ“ฃ ๐Ÿ“ฃ ๐Ÿ“ฃ Our new paper investigates the question of how many images ๐Ÿ–ผ๏ธ of a concept are required by a diffusion model ๐Ÿค– to imitatโ€ฆ.
0
61
0
@_shruti_joshi_
Shruti Joshi
10 months
RT @Tom__Marty: ๐ŸšจNEW PAPER OUT ๐Ÿšจ . Excited to share our latest research initiative on in-context learning and meta-learning through the lenโ€ฆ.
Tweet card summary image
arxiv.org
A central goal of machine learning is generalization. While the No Free Lunch Theorem states that we cannot obtain theoretical guarantees for generalization without further assumptions, in...
0
3
0
@_shruti_joshi_
Shruti Joshi
11 months
RT @leenaCvankadara: I am thrilled to announce that I will be joining the Gatsby Computational Neuroscience Unit at UCL as a Lecturer (Assiโ€ฆ.
0
6
0
@_shruti_joshi_
Shruti Joshi
1 year
RT @arkil_patel: Presenting tomorrow at #NAACL2024:. ๐ถ๐‘Ž๐‘› ๐ฟ๐ฟ๐‘€๐‘  ๐‘–๐‘›-๐‘๐‘œ๐‘›๐‘ก๐‘’๐‘ฅ๐‘ก ๐‘™๐‘’๐‘Ž๐‘Ÿ๐‘› ๐‘ก๐‘œ ๐‘ข๐‘ ๐‘’ ๐‘›๐‘’๐‘ค ๐‘๐‘Ÿ๐‘œ๐‘”๐‘Ÿ๐‘Ž๐‘š๐‘š๐‘–๐‘›๐‘” ๐‘™๐‘–๐‘๐‘Ÿ๐‘Ž๐‘Ÿ๐‘–๐‘’๐‘  ๐‘Ž๐‘›๐‘‘ ๐‘™๐‘Ž๐‘›๐‘”๐‘ข๐‘Ž๐‘”๐‘’๐‘ ?. ๐‘Œ๐‘’๐‘ . ๐พ๐‘–๐‘›๐‘‘ ๐‘œโ€ฆ.
0
20
0
@_shruti_joshi_
Shruti Joshi
1 year
RT @ncmeade: Adversarial Triggers For LLMs Are ๐—ก๐—ข๐—ง ๐—จ๐—ป๐—ถ๐˜ƒ๐—ฒ๐—ฟ๐˜€๐—ฎ๐—น!๐Ÿ˜ฒ. It is believed that adversarial triggers that jailbreak a model transfer unโ€ฆ.
0
32
0
@_shruti_joshi_
Shruti Joshi
1 year
RT @arkil_patel: ๐Ÿ“ข Exciting new work on AI safety!. Do adversarial triggers transfer universally across models (as has been claimed)?.๐—ก๐—ผ.โ€ฆ.
0
6
0
@_shruti_joshi_
Shruti Joshi
2 years
RT @arkil_patel: Presenting tomorrow at #EMNLP2023:. MAGNIFICo: Evaluating the In-Context Learning Ability of Large Language Models to Geneโ€ฆ.
0
17
0
@_shruti_joshi_
Shruti Joshi
2 years
RT @seblachap: 1/ Excited for our oral presentation at #NeurIPS2023 on "Additive Decoders for Latent Variables Identification and Cartesianโ€ฆ.
0
27
0