Harshay Shah @harshays_ X Profile

Harshay Shah

@harshays_

Followers

657

Following

5K

Media

3

Statuses

22

https://t.co/1r7jAipHXB

Joined May 2012

Don't wanna be here? Send us removal request.

Harshay Shah

@harshays_

10 months

MoEs provide two knobs for scaling: model size (total params) + FLOPs-per-token (via active params). What’s the right scaling strategy? And how does it depend on the pretraining budget? Our work introduces sparsity-aware scaling laws for MoE LMs to tackle these questions! 🧵👇

Samira Abnar

@samira_abnar

11 months

🚨 One question that has always intrigued me is the role of different ways to increase a model's capacity: parameters, parallelizable compute, or sequential compute? We explored this through the lens of MoEs:

1

6

36

Lucas Nestler

@Clashluke

11 months

Wake up babe New MoE scaling laws dropped

6

48

431

MIT CSAIL

@MIT_CSAIL

1 year

How can we really know if a chatbot is giving a reliable answer? 🧵 MIT CSAIL’s "ContextCite" tool can ID the parts of external context used to generate any particular statement from a language model, improving trust by helping users easily verify the statement:

3

14

48

MIT CSAIL

@MIT_CSAIL

1 year

How do black-box neural networks transform raw data into predictions? Inside these models are thousands of simple "components" working together. New MIT CSAIL research ( https://t.co/1qDZFIQUaZ) introduces a method that helps us understand how these components compose to affect

arxiv.org

How does the internal computation of a machine learning model transform inputs into predictions? In this paper, we introduce a task called component modeling that aims to address this question....

4

62

253

Harshay Shah

@harshays_

1 year

♥️

ESPNcricinfo

@ESPNcricinfo

1 year

THE WAIT IS OVER, INDIA! T20 WORLD CUP CHAMPIONS FOR THE SECOND TIME! 🇮🇳🏆

0

3

Aleksander Madry

@aleks_madry

2 years

How is an LLM actually using the info given to it in its context? Is it misinterpreting anything or making things up? Introducing ContextCite: a simple method for attributing LLM responses back to the context: https://t.co/bm1t7nybbh w/ @bcohenwang, @harshays_, @kris_georgiev1

7

49

243

Harshay Shah

@harshays_

2 years

New work with @andrew_ilyas and @aleks_madry on tracing predictions back to individual components (conv filters, attn heads) in the model! Paper: https://t.co/zEJ3oV0wrF Thread: 👇

arxiv.org

How does the internal computation of a machine learning model transform inputs into predictions? In this paper, we introduce a task called component modeling that aims to address this question....

Aleksander Madry

@aleks_madry

2 years

How do model components (conv filters, attn heads) collectively transform examples into predictions? Is it possible to somehow dissect how *every* model component contributes to a prediction? w/ @harshays_ @andrewilyas, we introduce a framework for tackling this question!

1

11

49

Harshay Shah

@harshays_

2 years

If you are at #ICML2023 today, check out our work on ModelDiff, a model-agnostic framework for pinpointing differences between any two (supervised) learning algorithms! Poster: #407 at 2pm (Wednesday) Paper: https://t.co/sMXNJvm38M w/ @smsampark @andrew_ilyas @aleks_madry

0

12

52

Andrew Ilyas

@andrew_ilyas

3 years

TRAK, our latest work on data attribution ( https://t.co/OuY6lu8tfm), speeds up datamodels up to 1000x! ➡️ our earlier work ModelDiff (w/ @harshays_ @smsampark @aleks_madry) can now compare any two learning algorithms in larger-scale settings. Try it out:

github.com

ModelDiff: A Framework for Comparing Learning Algorithms - MadryLab/modeldiff

Aleksander Madry

@aleks_madry

3 years

You’re deploying an ML system, choosing between two models trained w/ diff algs. Same training data, same acc... how do you differentiate their behavior? ModelDiff ( https://t.co/wJI2dOAGc1) lets you compare *any* two learning algs! w/ @harshays_ @smsampark @andrew_ilyas (1/8)

1

13

42

Aleksander Madry

@aleks_madry

3 years

You’re deploying an ML system, choosing between two models trained w/ diff algs. Same training data, same acc... how do you differentiate their behavior? ModelDiff ( https://t.co/wJI2dOAGc1) lets you compare *any* two learning algs! w/ @harshays_ @smsampark @andrew_ilyas (1/8)

4

67

298

Harshay Shah

@harshays_

4 years

Do input gradients highlight discriminative and task-relevant features? Our #NeurIPS2021 paper takes a three-pronged approach to evaluate the fidelity of input gradient attributions. Poster: session 3, spot C0 Paper: https://t.co/tnLwUBQtSh with @jainprateek_ and @pnetrapalli

0

13

57

Harshay Shah

@harshays_

5 years

Neural nets can generalize well on test data, but often lack robustness to distributional shifts & adversarial attacks. Our #NeurIPS2020 paper on simplicity bias sheds light on this phenomenon. Poster: session #4, town A2, spot C0, 12pm ET today! Paper: https://t.co/PszvszwTr0

0

9

65