
Shruti Joshi
@_shruti_joshi_
Followers
405
Following
2K
Media
3
Statuses
179
phd student in identifiable repl @Mila_Quebec. prev. research programmer @MPI_IS Tรผbingen, undergrad @IITKanpur '19.
Montreal, Canada
Joined August 2018
1\ Hi, can I get an unsupervised sparse autoencoder for steering, please? I only have unlabeled data varying across multiple unknown concepts. Oh, and make sure it learns the same features each time!. Yes! A freshly brewed Sparse Shift Autoencoder (SSAE) coming right up. ๐งถ
1
10
46
I will be at the Actionable Interpretability Workshop (@ActInterp, #ICML) presenting *SSAEs* in the East Ballroom A from 1-2pm. Drop by (or send a DM) to chat about (actionable) interpretability, (actionable) identifiability, and everything in between!.
1\ Hi, can I get an unsupervised sparse autoencoder for steering, please? I only have unlabeled data varying across multiple unknown concepts. Oh, and make sure it learns the same features each time!. Yes! A freshly brewed Sparse Shift Autoencoder (SSAE) coming right up. ๐งถ
1
6
24
RT @Sahil1V: ๐จ New Paper! ๐จ.Guard models slow, language-specific, and modality-limited?. Meet OmniGuard that detects harmful prompts acrossโฆ.
0
39
0
RT @soumyesinghal: โกโก Llama-Nemotron-Ultra-253B just dropped: our most advanced open reasoning model.๐งต๐
0
13
0
RT @arkil_patel: ๐๐ก๐จ๐ฎ๐ ๐ก๐ญ๐จ๐ฅ๐จ๐ ๐ฒ paper is out! ๐ฅ๐. We study the reasoning chains of DeepSeek-R1 across a variety of tasks and settings and finโฆ.
0
5
0
RT @arkil_patel: Presenting โจ ๐๐๐๐๐: ๐๐๐ง๐๐ซ๐๐ญ๐ข๐ง๐ ๐๐ก๐๐ฅ๐ฅ๐๐ง๐ ๐ข๐ง๐ ๐ฌ๐ฒ๐ง๐ญ๐ก๐๐ญ๐ข๐ ๐๐๐ญ๐ ๐๐จ๐ซ ๐๐ฏ๐๐ฅ๐ฎ๐๐ญ๐ข๐จ๐ง โจ. Work w/ fantastic advisors @DBahdanau and @sivโฆ.
0
18
0
Curious to find out more? Check out our pre-print at: Work done with an amazing set of researchers: @andrea_dittadi, @seblachap, and @dhanya_sridhar!.
arxiv.org
Steering methods manipulate the representations of large language models (LLMs) to induce responses that have desired properties, e.g., truthfulness, offering a promising approach for LLM...
0
0
6
RT @Sahil1V: ๐ฃ ๐ฃ ๐ฃ Our new paper investigates the question of how many images ๐ผ๏ธ of a concept are required by a diffusion model ๐ค to imitatโฆ.
0
61
0
RT @Tom__Marty: ๐จNEW PAPER OUT ๐จ . Excited to share our latest research initiative on in-context learning and meta-learning through the lenโฆ.
arxiv.org
A central goal of machine learning is generalization. While the No Free Lunch Theorem states that we cannot obtain theoretical guarantees for generalization without further assumptions, in...
0
3
0
RT @leenaCvankadara: I am thrilled to announce that I will be joining the Gatsby Computational Neuroscience Unit at UCL as a Lecturer (Assiโฆ.
0
6
0
RT @arkil_patel: Presenting tomorrow at #NAACL2024:. ๐ถ๐๐ ๐ฟ๐ฟ๐๐ ๐๐-๐๐๐๐ก๐๐ฅ๐ก ๐๐๐๐๐ ๐ก๐ ๐ข๐ ๐ ๐๐๐ค ๐๐๐๐๐๐๐๐๐๐๐ ๐๐๐๐๐๐๐๐๐ ๐๐๐ ๐๐๐๐๐ข๐๐๐๐ ?. ๐๐๐ . ๐พ๐๐๐ ๐โฆ.
0
20
0
RT @ncmeade: Adversarial Triggers For LLMs Are ๐ก๐ข๐ง ๐จ๐ป๐ถ๐๐ฒ๐ฟ๐๐ฎ๐น!๐ฒ. It is believed that adversarial triggers that jailbreak a model transfer unโฆ.
0
32
0
RT @arkil_patel: ๐ข Exciting new work on AI safety!. Do adversarial triggers transfer universally across models (as has been claimed)?.๐ก๐ผ.โฆ.
0
6
0
RT @arkil_patel: Presenting tomorrow at #EMNLP2023:. MAGNIFICo: Evaluating the In-Context Learning Ability of Large Language Models to Geneโฆ.
0
17
0
RT @seblachap: 1/ Excited for our oral presentation at #NeurIPS2023 on "Additive Decoders for Latent Variables Identification and Cartesianโฆ.
0
27
0