ianccovert Profile Banner
Ian Covert Profile
Ian Covert

@ianccovert

Followers
370
Following
313
Media
8
Statuses
40

Postdoc @Stanford, previously @uwcse @GoogleAI and @Columbia. Interested in deep learning and explainable AI

Palo Alto, CA
Joined February 2017
Don't wanna be here? Send us removal request.
@ianccovert
Ian Covert
3 years
Making this class with Su-In, Hugh and Chris was one of the most fun things I did in grad school. We covered a ton of material, definitely check out all the slides we made https://t.co/PFfm6omdOy I'm excited to see how the course evolves in the next couple years!
0
3
25
@Sahil1V
Sahil Verma
1 year
📣 📣 📣 Our new paper investigates the question of how many images 🖼️ of a concept are required by a diffusion model 🤖 to imitate it. This question is critical for understanding and mitigating the copyright and privacy infringements of these models! https://t.co/bvdVU1M0Hh
10
61
226
@james_y_zou
James Zou
1 year
Very excited to introduce locality alignment, an efficient post-training algorithm to improve your ViTs + VLMs, essentially for free🚀 Local align = new self-supervised objective ensuring that encoder captures fine-grained spatial info. No new data needed. Here's the idea 1/3
5
58
301
@soham_gadgil
Soham Gadgil
2 years
How to perform dynamic feature selection without assumptions about the data distribution or fitting generative models? We develop a learning approach to estimate the conditional mutual information in a discriminative fashion for selecting features. https://t.co/6dCHlJJA9m
1
2
9
@ianccovert
Ian Covert
2 years
Large models are tough because you may not be able to query the model thousands of times to get attributions (e.g., KernelSHAP). This is something we've tackled in a couple other papers FastSHAP (ICLR'22): https://t.co/dGGUEaHs5j ViT Shapley (ICLR'23):
Tweet card summary image
arxiv.org
Transformers have become a default architecture in computer vision, but understanding what drives their predictions remains a challenging problem. Current explanation approaches rely on attention...
1
2
5
@ianccovert
Ian Covert
2 years
Improving Shapley value computation is a rich topic, and we discuss some open problems here. One of the main outstanding issues (to me) is maintaining speed/accuracy with large models (e.g., large transformers)
1
0
1
@ianccovert
Ian Covert
2 years
For the second question (reducing exp. complexity) we also found that algorithms were derived from different mathematical views of the Shapley value. E.g., it can be viewed as the weighted average marginal contribution, or as the solution to a weighted least squares problem
1
0
1
@ianccovert
Ian Covert
2 years
The complexity in these papers basically boils down to two choices: 1) how to remove feature information from the model, and 2) how to reduce the exponential complexity of the Shapley value calculation
1
0
1
@ianccovert
Ian Covert
2 years
In our recent Nature MI paper, we looked at the surprising number of algorithms that estimate Shapley values (whose computation scales exponentially with the number of players). There are a lot, we counted at least 24 papers on this topic! Paper:
1
8
33
@suinleelab
Su-In Lee
3 years
My amazing PhD student @ianccovert will present our work on ViT Shapley at #ICLR2023 soon -- Mon 1 May 11 CAT!
@ianccovert
Ian Covert
3 years
If you want to know what your ViT pays attention to...you might not want to use attention values! Shapley values can do this better, and now they can even do it efficiently. Check out our new paper (ICLR spotlight) https://t.co/yhraLMTDLQ 🧵⬇️
0
4
10
@chrislin97
Chris Lin
3 years
We have an upcoming paper at ICLR 2023 on a new feature attribution method for explaining representations learned by unsupervised models! https://t.co/kiLQbF8S4i This was joint work with the fantastic @HughChen18 @ChanwooKim_ and my advisor @suinleelab. (1/n)
1
5
23
@pandeyparul
Parul Pandey
3 years
Check out this course on #XAI by @suinleelab & @ianccovert. Very practical and nicely curated. Also points to some great papers on the topic. The course covers a broad set of principles and techniques. Slides are available here: https://t.co/g92FV8Bnko #ResponsibleAI
0
3
5
@ianccovert
Ian Covert
3 years
This was joint work with the fantastic @ChanwooKim_ and @suinleelab from @uwcse/@uw_wail. And big shoutout to collaborators @SudarshanMukund @neiljethani + Rajesh from the paper this work builds on https://t.co/dGGUEaHs5j (9/9)
0
1
2
@ianccovert
Ian Covert
3 years
Our experiments used three image datasets and models as big as ViT-Large (arXiv needs to be updated), but there's still plenty of room to scale this up. My guess is that 1) it gets better with more data, 2) it can help learn better representations than the original task (8/n)
1
0
2
@ianccovert
Ian Covert
3 years
We compared to many baselines (attention-/gradient-based methods) on many metrics, and SVs almost always performed best. For example, removing high-scoring patches makes the pred drop quickly, and inserting high-scoring patches makes the pred rise quickly (7/n)
1
0
2
@ianccovert
Ian Covert
3 years
There are some details about the right way of removing patches (we use attention masking) and how to effectively train the ViT explainer (we fine-tuned an existing model). But the final approach is relatively simple, and it works (6/n)
1
0
1
@ianccovert
Ian Covert
3 years
In a previous paper, we showed that SVs have a variational characterization that can be used as an objective function + trained with SGD. We took that idea a bit further here, proving that the loss upper bounds distance to true SVs due to strong convexity (5/n)
1
0
2
@ianccovert
Ian Covert
3 years
The issue is how to do this efficiently, because the number of patch subsets is exponential in the number of patches. So how do we handle large models like ViTs? Our solution: amortized optimization https://t.co/Xwl7XmVxCq (4/n)
1
0
2
@ianccovert
Ian Covert
3 years
The question we’re trying to answer is *which patches influence the prediction.* And Shapley values are a surprisingly simple approach: they're like leave-one-out, but the effect of removing a patch is averaged across all sets of preceding patches (3/n)
1
0
3
@ianccovert
Ian Covert
3 years
TL;DR: we fine-tune a ViT to directly predict its Shapley values, without using a dataset of ground truth examples. And it works quite well (2/n)
1
0
8