Harshay Shah Profile
Harshay Shah

@harshays_

Followers
657
Following
5K
Media
3
Statuses
22

Joined May 2012
Don't wanna be here? Send us removal request.
@harshays_
Harshay Shah
10 months
MoEs provide two knobs for scaling: model size (total params) + FLOPs-per-token (via active params). What’s the right scaling strategy? And how does it depend on the pretraining budget? Our work introduces sparsity-aware scaling laws for MoE LMs to tackle these questions! 🧵👇
@samira_abnar
Samira Abnar
11 months
🚨 One question that has always intrigued me is the role of different ways to increase a model's capacity: parameters, parallelizable compute, or sequential compute? We explored this through the lens of MoEs:
1
6
36
@Clashluke
Lucas Nestler
11 months
Wake up babe New MoE scaling laws dropped
6
48
431
@MIT_CSAIL
MIT CSAIL
1 year
How can we really know if a chatbot is giving a reliable answer? 🧵 MIT CSAIL’s "ContextCite" tool can ID the parts of external context used to generate any particular statement from a language model, improving trust by helping users easily verify the statement:
3
14
48
@MIT_CSAIL
MIT CSAIL
1 year
How do black-box neural networks transform raw data into predictions? Inside these models are thousands of simple "components" working together. New MIT CSAIL research ( https://t.co/1qDZFIQUaZ) introduces a method that helps us understand how these components compose to affect
Tweet card summary image
arxiv.org
How does the internal computation of a machine learning model transform inputs into predictions? In this paper, we introduce a task called component modeling that aims to address this question....
4
62
253
@harshays_
Harshay Shah
1 year
♥️
@ESPNcricinfo
ESPNcricinfo
1 year
THE WAIT IS OVER, INDIA! T20 WORLD CUP CHAMPIONS FOR THE SECOND TIME! 🇮🇳🏆
0
0
3
@aleks_madry
Aleksander Madry
2 years
How is an LLM actually using the info given to it in its context? Is it misinterpreting anything or making things up? Introducing ContextCite: a simple method for attributing LLM responses back to the context: https://t.co/bm1t7nybbh w/ @bcohenwang, @harshays_, @kris_georgiev1
7
49
243
@harshays_
Harshay Shah
2 years
New work with @andrew_ilyas and @aleks_madry on tracing predictions back to individual components (conv filters, attn heads) in the model! Paper: https://t.co/zEJ3oV0wrF Thread: 👇
Tweet card summary image
arxiv.org
How does the internal computation of a machine learning model transform inputs into predictions? In this paper, we introduce a task called component modeling that aims to address this question....
@aleks_madry
Aleksander Madry
2 years
How do model components (conv filters, attn heads) collectively transform examples into predictions? Is it possible to somehow dissect how *every* model component contributes to a prediction? w/ @harshays_ @andrewilyas, we introduce a framework for tackling this question!
1
11
49
@harshays_
Harshay Shah
2 years
If you are at #ICML2023 today, check out our work on ModelDiff, a model-agnostic framework for pinpointing differences between any two (supervised) learning algorithms! Poster: #407 at 2pm (Wednesday) Paper: https://t.co/sMXNJvm38M w/ @smsampark @andrew_ilyas @aleks_madry
0
12
52
@andrew_ilyas
Andrew Ilyas
3 years
TRAK, our latest work on data attribution ( https://t.co/OuY6lu8tfm), speeds up datamodels up to 1000x! ➡️ our earlier work ModelDiff (w/ @harshays_ @smsampark @aleks_madry) can now compare any two learning algorithms in larger-scale settings. Try it out:
Tweet card summary image
github.com
ModelDiff: A Framework for Comparing Learning Algorithms - MadryLab/modeldiff
@aleks_madry
Aleksander Madry
3 years
You’re deploying an ML system, choosing between two models trained w/ diff algs. Same training data, same acc... how do you differentiate their behavior? ModelDiff ( https://t.co/wJI2dOAGc1) lets you compare *any* two learning algs! w/ @harshays_ @smsampark @andrew_ilyas (1/8)
1
13
42
@aleks_madry
Aleksander Madry
3 years
You’re deploying an ML system, choosing between two models trained w/ diff algs. Same training data, same acc... how do you differentiate their behavior? ModelDiff ( https://t.co/wJI2dOAGc1) lets you compare *any* two learning algs! w/ @harshays_ @smsampark @andrew_ilyas (1/8)
4
67
298
@harshays_
Harshay Shah
4 years
Do input gradients highlight discriminative and task-relevant features? Our #NeurIPS2021 paper takes a three-pronged approach to evaluate the fidelity of input gradient attributions. Poster: session 3, spot C0 Paper: https://t.co/tnLwUBQtSh with @jainprateek_ and @pnetrapalli
0
13
57
@harshays_
Harshay Shah
5 years
Neural nets can generalize well on test data, but often lack robustness to distributional shifts & adversarial attacks. Our #NeurIPS2020 paper on simplicity bias sheds light on this phenomenon. Poster: session #4, town A2, spot C0, 12pm ET today! Paper: https://t.co/PszvszwTr0
0
9
65