Ian Covert
@ianccovert
Followers
370
Following
313
Media
8
Statuses
40
Postdoc @Stanford, previously @uwcse @GoogleAI and @Columbia. Interested in deep learning and explainable AI
Palo Alto, CA
Joined February 2017
Making this class with Su-In, Hugh and Chris was one of the most fun things I did in grad school. We covered a ton of material, definitely check out all the slides we made https://t.co/PFfm6omdOy I'm excited to see how the course evolves in the next couple years!
0
3
25
📣 📣 📣 Our new paper investigates the question of how many images 🖼️ of a concept are required by a diffusion model 🤖 to imitate it. This question is critical for understanding and mitigating the copyright and privacy infringements of these models! https://t.co/bvdVU1M0Hh
10
61
226
Very excited to introduce locality alignment, an efficient post-training algorithm to improve your ViTs + VLMs, essentially for free🚀 Local align = new self-supervised objective ensuring that encoder captures fine-grained spatial info. No new data needed. Here's the idea 1/3
5
58
301
How to perform dynamic feature selection without assumptions about the data distribution or fitting generative models? We develop a learning approach to estimate the conditional mutual information in a discriminative fashion for selecting features. https://t.co/6dCHlJJA9m
1
2
9
This was work done with @HughChen18 @scottlundberg and of course our advisor @suinleelab NMI version: https://t.co/5OWuagexz6 arXiv version:
arxiv.org
Feature attributions based on the Shapley value are popular for explaining machine learning models; however, their estimation is complex from both a theoretical and computational standpoint. We...
0
0
1
Large models are tough because you may not be able to query the model thousands of times to get attributions (e.g., KernelSHAP). This is something we've tackled in a couple other papers FastSHAP (ICLR'22): https://t.co/dGGUEaHs5j ViT Shapley (ICLR'23):
arxiv.org
Transformers have become a default architecture in computer vision, but understanding what drives their predictions remains a challenging problem. Current explanation approaches rely on attention...
1
2
5
Improving Shapley value computation is a rich topic, and we discuss some open problems here. One of the main outstanding issues (to me) is maintaining speed/accuracy with large models (e.g., large transformers)
1
0
1
For the second question (reducing exp. complexity) we also found that algorithms were derived from different mathematical views of the Shapley value. E.g., it can be viewed as the weighted average marginal contribution, or as the solution to a weighted least squares problem
1
0
1
The complexity in these papers basically boils down to two choices: 1) how to remove feature information from the model, and 2) how to reduce the exponential complexity of the Shapley value calculation
1
0
1
In our recent Nature MI paper, we looked at the surprising number of algorithms that estimate Shapley values (whose computation scales exponentially with the number of players). There are a lot, we counted at least 24 papers on this topic! Paper:
1
8
33
My amazing PhD student @ianccovert will present our work on ViT Shapley at #ICLR2023 soon -- Mon 1 May 11 CAT!
If you want to know what your ViT pays attention to...you might not want to use attention values! Shapley values can do this better, and now they can even do it efficiently. Check out our new paper (ICLR spotlight) https://t.co/yhraLMTDLQ 🧵⬇️
0
4
10
We have an upcoming paper at ICLR 2023 on a new feature attribution method for explaining representations learned by unsupervised models! https://t.co/kiLQbF8S4i This was joint work with the fantastic @HughChen18 @ChanwooKim_ and my advisor @suinleelab. (1/n)
1
5
23
Check out this course on #XAI by @suinleelab & @ianccovert. Very practical and nicely curated. Also points to some great papers on the topic. The course covers a broad set of principles and techniques. Slides are available here: https://t.co/g92FV8Bnko
#ResponsibleAI
0
3
5
This was joint work with the fantastic @ChanwooKim_ and @suinleelab from @uwcse/@uw_wail. And big shoutout to collaborators @SudarshanMukund @neiljethani + Rajesh from the paper this work builds on https://t.co/dGGUEaHs5j (9/9)
0
1
2
Our experiments used three image datasets and models as big as ViT-Large (arXiv needs to be updated), but there's still plenty of room to scale this up. My guess is that 1) it gets better with more data, 2) it can help learn better representations than the original task (8/n)
1
0
2
We compared to many baselines (attention-/gradient-based methods) on many metrics, and SVs almost always performed best. For example, removing high-scoring patches makes the pred drop quickly, and inserting high-scoring patches makes the pred rise quickly (7/n)
1
0
2
There are some details about the right way of removing patches (we use attention masking) and how to effectively train the ViT explainer (we fine-tuned an existing model). But the final approach is relatively simple, and it works (6/n)
1
0
1
In a previous paper, we showed that SVs have a variational characterization that can be used as an objective function + trained with SGD. We took that idea a bit further here, proving that the loss upper bounds distance to true SVs due to strong convexity (5/n)
1
0
2
The issue is how to do this efficiently, because the number of patch subsets is exponential in the number of patches. So how do we handle large models like ViTs? Our solution: amortized optimization https://t.co/Xwl7XmVxCq (4/n)
1
0
2
The question we’re trying to answer is *which patches influence the prediction.* And Shapley values are a surprisingly simple approach: they're like leave-one-out, but the effect of removing a patch is averaged across all sets of preceding patches (3/n)
1
0
3
TL;DR: we fine-tune a ViT to directly predict its Shapley values, without using a dataset of ground truth examples. And it works quite well (2/n)
1
0
8