Ian Covert @ianccovert X Profile

Ian Covert

@ianccovert

Followers

308

Following

283

Media

8

Statuses

40

Postdoc @Stanford, previously @uwcse @GoogleAI and @Columbia. Interested in deep learning and explainable AI

Palo Alto, CA

Joined February 2017

Don't wanna be here? Send us removal request.

Ian Covert

@ianccovert

3 years

Making this class with Su-In, Hugh and Chris was one of the most fun things I did in grad school. We covered a ton of material, definitely check out all the slides we made I'm excited to see how the course evolves in the next couple years!.

0

3

25

Ian Covert

@ianccovert

9 months

RT @Sahil1V: 📣 📣 📣 Our new paper investigates the question of how many images 🖼️ of a concept are required by a diffusion model 🤖 to imitat….

0

61

0

Ian Covert

@ianccovert

9 months

RT @james_y_zou: Very excited to introduce locality alignment, an efficient post-training algorithm to improve your ViTs + VLMs, essentiall….

0

61

0

Ian Covert

@ianccovert

2 years

RT @soham_gadgil: How to perform dynamic feature selection without assumptions about the data distribution or fitting generative models? We….

0

2

0

Ian Covert

@ianccovert

2 years

This was work done with @HughChen18 @scottlundberg and of course our advisor @suinleelab . NMI version: arXiv version:

arxiv.org

Feature attributions based on the Shapley value are popular for explaining machine learning models; however, their estimation is complex from both a theoretical and computational standpoint. We...

0

1

Ian Covert

@ianccovert

2 years

Large models are tough because you may not be able to query the model thousands of times to get attributions (e.g., KernelSHAP). This is something we've tackled in a couple other papers. FastSHAP (ICLR'22): ViT Shapley (ICLR'23):

arxiv.org

Transformers have become a default architecture in computer vision, but understanding what drives their predictions remains a challenging problem. Current explanation approaches rely on attention...

1

2

5

Ian Covert

@ianccovert

2 years

Improving Shapley value computation is a rich topic, and we discuss some open problems here. One of the main outstanding issues (to me) is maintaining speed/accuracy with large models (e.g., large transformers).

1

0

1

Ian Covert

@ianccovert

2 years

For the second question (reducing exp. complexity) we also found that algorithms were derived from different mathematical views of the Shapley value. E.g., it can be viewed as the weighted average marginal contribution, or as the solution to a weighted least squares problem.

1

0

1

Ian Covert

@ianccovert

2 years

The complexity in these papers basically boils down to two choices: 1) how to remove feature information from the model, and 2) how to reduce the exponential complexity of the Shapley value calculation

1

0

1

Ian Covert

@ianccovert

2 years

In our recent Nature MI paper, we looked at the surprising number of algorithms that estimate Shapley values (whose computation scales exponentially with the number of players). There are a lot, we counted at least 24 papers on this topic! . Paper:

1

8

33

Ian Covert

@ianccovert

2 years

RT @suinleelab: My amazing PhD student @ianccovert will present our work on ViT Shapley at #ICLR2023 soon -- Mon 1 May 11 CAT!.

0

4

0

Ian Covert

@ianccovert

2 years

RT @chrislin97: We have an upcoming paper at ICLR 2023 on a new feature attribution method for explaining representations learned by unsupe….

0

5

0

Ian Covert

@ianccovert

2 years

RT @pandeyparul: Check out this course on #XAI by @suinleelab. & @ianccovert. Very practical and nicely curated. Also points to some great….

0

3

0

Ian Covert

@ianccovert

2 years

This was joint work with the fantastic @ChanwooKim_ and @suinleelab from @uwcse/@uw_wail. And big shoutout to collaborators @SudarshanMukund @neiljethani + Rajesh from the paper this work builds on (9/9).

0

1

2

Ian Covert

@ianccovert

2 years

Our experiments used three image datasets and models as big as ViT-Large (arXiv needs to be updated), but there's still plenty of room to scale this up. My guess is that 1) it gets better with more data, 2) it can help learn better representations than the original task (8/n).

1

0

2

Ian Covert

@ianccovert

2 years

We compared to many baselines (attention-/gradient-based methods) on many metrics, and SVs almost always performed best. For example, removing high-scoring patches makes the pred drop quickly, and inserting high-scoring patches makes the pred rise quickly (7/n)

1

0

2

Ian Covert

@ianccovert

2 years

There are some details about the right way of removing patches (we use attention masking) and how to effectively train the ViT explainer (we fine-tuned an existing model). But the final approach is relatively simple, and it works (6/n).

1

0

1

Ian Covert

@ianccovert

2 years

In a previous paper, we showed that SVs have a variational characterization that can be used as an objective function + trained with SGD. We took that idea a bit further here, proving that the loss upper bounds distance to true SVs due to strong convexity (5/n)

1

0

2

Ian Covert

@ianccovert

2 years

The issue is how to do this efficiently, because the number of patch subsets is exponential in the number of patches. So how do we handle large models like ViTs? Our solution: amortized optimization (4/n).

1

0

2

Ian Covert

@ianccovert

2 years

The question we’re trying to answer is *which patches influence the prediction.* And Shapley values are a surprisingly simple approach: they're like leave-one-out, but the effect of removing a patch is averaged across all sets of preceding patches (3/n).

1

0

3

Ian Covert

@ianccovert

2 years

TL;DR: we fine-tune a ViT to directly predict its Shapley values, without using a dataset of ground truth examples. And it works quite well (2/n)

1

0

8