_shruti_joshi_ Profile Banner
Shruti Joshi Profile
Shruti Joshi

@_shruti_joshi_

Followers
404
Following
2K
Media
3
Statuses
181

phd student in identifiable repl. prev. research programmer @MPI_IS Tรผbingen, undergrad @IITKanpur '19.

Montreal, Canada
Joined August 2018
Don't wanna be here? Send us removal request.
@_shruti_joshi_
Shruti Joshi
10 months
1\ Hi, can I get an unsupervised sparse autoencoder for steering, please? I only have unlabeled data varying across multiple unknown concepts. Oh, and make sure it learns the same features each time! Yes! A freshly brewed Sparse Shift Autoencoder (SSAE) coming right up. ๐Ÿงถ
1
10
46
@divyat09
Divyat Mahajan
2 months
[1/9] While pretraining data might be hitting a wall, novel methods for modeling it are just getting started! We introduce future summary prediction (FSP), where the model predicts future sequence embeddings to reduce teacher forcing & shortcut learning. ๐Ÿ“ŒPredict a learned
10
47
221
@arkil_patel
Arkil Patel
2 months
Iโ€™m at CoLM this week! Come check out our work on evaluating RMs for agent trajectories! These days, Iโ€™m thinking about forecasting generalization, scaling laws, and safety/adversarial attacks. Ping me if you wanna chat about research!
@xhluca
Xing Han Lu
2 months
i will be presenting AgentRewardBench at #COLM2025 next week! session: #3 date: wednesday 11am to 1pm poster: #545 come learn more about the paper, my recent works or just chat about anything (montreal, mila, etc.) here's a teaser of my poster :)
0
5
7
@_shruti_joshi_
Shruti Joshi
5 months
^on Saturday, 19th July.
0
0
0
@_shruti_joshi_
Shruti Joshi
5 months
I will be at the Actionable Interpretability Workshop (@ActInterp, #ICML) presenting *SSAEs* in the East Ballroom A from 1-2pm. Drop by (or send a DM) to chat about (actionable) interpretability, (actionable) identifiability, and everything in between!
@_shruti_joshi_
Shruti Joshi
10 months
1\ Hi, can I get an unsupervised sparse autoencoder for steering, please? I only have unlabeled data varying across multiple unknown concepts. Oh, and make sure it learns the same features each time! Yes! A freshly brewed Sparse Shift Autoencoder (SSAE) coming right up. ๐Ÿงถ
1
6
24
@Sahil1V
Sahil Verma
6 months
๐Ÿšจ New Paper! ๐Ÿšจ Guard models slow, language-specific, and modality-limited? Meet OmniGuard that detects harmful prompts across multiple languages & modalities all using one approach with SOTA performance in all 3 modalities!! while being 120X faster ๐Ÿš€ https://t.co/r6DGPDfwle
1
43
79
@soumyesinghal
Soumye Singhal
8 months
โšกโšก Llama-Nemotron-Ultra-253B just dropped: our most advanced open reasoning model ๐Ÿงต๐Ÿ‘‡
3
13
44
@arkil_patel
Arkil Patel
9 months
๐“๐ก๐จ๐ฎ๐ ๐ก๐ญ๐จ๐ฅ๐จ๐ ๐ฒ paper is out! ๐Ÿ”ฅ๐Ÿ‹ We study the reasoning chains of DeepSeek-R1 across a variety of tasks and settings and find several surprising and interesting phenomena! Incredible effort by the entire team! ๐ŸŒ: https://t.co/CDlFHD28xQ
@saraveramarjano
Sara Vera Marjanoviฤ‡
9 months
Models like DeepSeek-R1 ๐Ÿ‹ mark a fundamental shift in how LLMs approach complex problems. In our preprint on R1 Thoughtology, we study R1โ€™s reasoning chains across a variety of tasks; investigating its capabilities, limitations, and behaviour. ๐Ÿ”—: https://t.co/Cyy18kYQ45
1
5
26
@arkil_patel
Arkil Patel
10 months
Presenting โœจ ๐‚๐‡๐€๐’๐„: ๐†๐ž๐ง๐ž๐ซ๐š๐ญ๐ข๐ง๐  ๐œ๐ก๐š๐ฅ๐ฅ๐ž๐ง๐ ๐ข๐ง๐  ๐ฌ๐ฒ๐ง๐ญ๐ก๐ž๐ญ๐ข๐œ ๐๐š๐ญ๐š ๐Ÿ๐จ๐ซ ๐ž๐ฏ๐š๐ฅ๐ฎ๐š๐ญ๐ข๐จ๐ง โœจ Work w/ fantastic advisors @DBahdanau and @sivareddyg Thread ๐Ÿงต:
1
18
41
@_shruti_joshi_
Shruti Joshi
10 months
Curious to find out more? Check out our pre-print at: https://t.co/iKlCxwNhgC. Work done with an amazing set of researchers: @andrea_dittadi, @seblachap, and @dhanya_sridhar!
0
0
6
@_shruti_joshi_
Shruti Joshi
10 months
5\ So, does it actually work? We show that SSAE accurately steers embeddings on both semi-synthetic and real-world datasets (like TruthfulQA) using Llama-3.1-8B, handling in- and out-of-distribution data with ease.
1
0
1
@_shruti_joshi_
Shruti Joshi
10 months
4\ What does this mean for steering? You get access to steering vectors for individual concepts, such that each vector consistently steers only a single concept, and can be scaled according to the context.
1
0
1
@_shruti_joshi_
Shruti Joshi
10 months
3\ With sufficiently diverse data (such as in the real world), SSAEs remain identifiable up to permutation and scalingโ€” repeated runs yield consistent representations, differing only by trivial indeterminacies.
1
0
3
@_shruti_joshi_
Shruti Joshi
10 months
2\ The SSAE is designed to map the difference between text embeddings (varying across multiple unknown concepts) to a sparse representation. Unlike standard SAEs, which impose sparsity on the concept representations themselves, we focus on sparsity of the shifts to the concepts.
1
0
2
@Sahil1V
Sahil Verma
1 year
๐Ÿ“ฃ ๐Ÿ“ฃ ๐Ÿ“ฃ Our new paper investigates the question of how many images ๐Ÿ–ผ๏ธ of a concept are required by a diffusion model ๐Ÿค– to imitate it. This question is critical for understanding and mitigating the copyright and privacy infringements of these models! https://t.co/bvdVU1M0Hh
10
61
225
@Tom__Marty
Tom Marty
1 year
๐ŸšจNEW PAPER OUT ๐Ÿšจ Excited to share our latest research initiative on in-context learning and meta-learning through the lens of Information theory !๐Ÿง  ๐Ÿ”— https://t.co/Tj5cYudDwy Check out our insights and empirical experiments! ๐Ÿ”
Tweet card summary image
arxiv.org
A central goal of machine learning is generalization. While the No Free Lunch Theorem states that we cannot obtain theoretical guarantees for generalization without further assumptions, in...
@EricElmoznino
Eric Elmoznino
1 year
Introducing our new paper explaining in-context learning through the lens of Occamโ€™s razor, giving a normative account of next-token prediction objectives. This was with @Tom__Marty @tejaskasetty @le0gagn0n @sarthmit @MahanFathi @dhanya_sridhar @g_lajoie_
0
3
7
@leenaCvankadara
Leena C Vankadara
1 year
I am thrilled to announce that I will be joining the Gatsby Computational Neuroscience Unit at UCL as a Lecturer (Assistant Professor) in Feb 2025! Looking forward to working with the exceptional talent at @GatsbyUCL on cutting-edge problems in deep learning and causality.
@GatsbyUCL
Gatsby Computational Neuroscience Unit
1 year
We are delighted to announce that Dr Leena Chennuru Vankadara will join the Unit as Lecturer in Feb 2025, developing theoretical understandings of scaling and generalization in deep learning and causality. Welcome aboard @leenaCvankadara! Learn more at https://t.co/jASvmzGZFP
10
6
66
@arkil_patel
Arkil Patel
1 year
Presenting tomorrow at #NAACL2024: ๐ถ๐‘Ž๐‘› ๐ฟ๐ฟ๐‘€๐‘  ๐‘–๐‘›-๐‘๐‘œ๐‘›๐‘ก๐‘’๐‘ฅ๐‘ก ๐‘™๐‘’๐‘Ž๐‘Ÿ๐‘› ๐‘ก๐‘œ ๐‘ข๐‘ ๐‘’ ๐‘›๐‘’๐‘ค ๐‘๐‘Ÿ๐‘œ๐‘”๐‘Ÿ๐‘Ž๐‘š๐‘š๐‘–๐‘›๐‘” ๐‘™๐‘–๐‘๐‘Ÿ๐‘Ž๐‘Ÿ๐‘–๐‘’๐‘  ๐‘Ž๐‘›๐‘‘ ๐‘™๐‘Ž๐‘›๐‘”๐‘ข๐‘Ž๐‘”๐‘’๐‘ ? ๐‘Œ๐‘’๐‘ . ๐พ๐‘–๐‘›๐‘‘ ๐‘œ๐‘“. Internship @allen_ai work with @pdasigi and my advisors @DBahdanau and @sivareddyg.
3
21
74
@ncmeade
Nicholas Meade
2 years
Adversarial Triggers For LLMs Are ๐—ก๐—ข๐—ง ๐—จ๐—ป๐—ถ๐˜ƒ๐—ฒ๐—ฟ๐˜€๐—ฎ๐—น!๐Ÿ˜ฒ It is believed that adversarial triggers that jailbreak a model transfer universally to other models. But we show triggers don't reliably transfer, especially to RLHF/DPO models. Paper: https://t.co/nRdw2h1rgS
3
32
98
@arkil_patel
Arkil Patel
2 years
๐Ÿ“ข Exciting new work on AI safety! Do adversarial triggers transfer universally across models (as has been claimed)? ๐—ก๐—ผ. Are models aligned by supervised fine-tuning safe against adversarial triggers? ๐—ก๐—ผ. RLHF and DPO are far better!
@ncmeade
Nicholas Meade
2 years
Adversarial Triggers For LLMs Are ๐—ก๐—ข๐—ง ๐—จ๐—ป๐—ถ๐˜ƒ๐—ฒ๐—ฟ๐˜€๐—ฎ๐—น!๐Ÿ˜ฒ It is believed that adversarial triggers that jailbreak a model transfer universally to other models. But we show triggers don't reliably transfer, especially to RLHF/DPO models. Paper: https://t.co/nRdw2h1rgS
0
6
20