ianccovert Profile Banner
Ian Covert Profile
Ian Covert

@ianccovert

Followers
308
Following
283
Media
8
Statuses
40

Postdoc @Stanford, previously @uwcse @GoogleAI and @Columbia. Interested in deep learning and explainable AI

Palo Alto, CA
Joined February 2017
Don't wanna be here? Send us removal request.
@ianccovert
Ian Covert
3 years
Making this class with Su-In, Hugh and Chris was one of the most fun things I did in grad school. We covered a ton of material, definitely check out all the slides we made I'm excited to see how the course evolves in the next couple years!.
0
3
25
@ianccovert
Ian Covert
9 months
RT @Sahil1V: šŸ“£ šŸ“£ šŸ“£ Our new paper investigates the question of how many images šŸ–¼ļø of a concept are required by a diffusion model šŸ¤– to imitat….
0
61
0
@ianccovert
Ian Covert
9 months
RT @james_y_zou: Very excited to introduce locality alignment, an efficient post-training algorithm to improve your ViTs + VLMs, essentiall….
0
61
0
@ianccovert
Ian Covert
2 years
RT @soham_gadgil: How to perform dynamic feature selection without assumptions about the data distribution or fitting generative models? We….
0
2
0
@ianccovert
Ian Covert
2 years
Large models are tough because you may not be able to query the model thousands of times to get attributions (e.g., KernelSHAP). This is something we've tackled in a couple other papers. FastSHAP (ICLR'22): ViT Shapley (ICLR'23):
arxiv.org
Transformers have become a default architecture in computer vision, but understanding what drives their predictions remains a challenging problem. Current explanation approaches rely on attention...
1
2
5
@ianccovert
Ian Covert
2 years
Improving Shapley value computation is a rich topic, and we discuss some open problems here. One of the main outstanding issues (to me) is maintaining speed/accuracy with large models (e.g., large transformers).
1
0
1
@ianccovert
Ian Covert
2 years
For the second question (reducing exp. complexity) we also found that algorithms were derived from different mathematical views of the Shapley value. E.g., it can be viewed as the weighted average marginal contribution, or as the solution to a weighted least squares problem.
1
0
1
@ianccovert
Ian Covert
2 years
The complexity in these papers basically boils down to two choices: 1) how to remove feature information from the model, and 2) how to reduce the exponential complexity of the Shapley value calculation
Tweet media one
1
0
1
@ianccovert
Ian Covert
2 years
In our recent Nature MI paper, we looked at the surprising number of algorithms that estimate Shapley values (whose computation scales exponentially with the number of players). There are a lot, we counted at least 24 papers on this topic! . Paper:
1
8
33
@ianccovert
Ian Covert
2 years
RT @suinleelab: My amazing PhD student @ianccovert will present our work on ViT Shapley at #ICLR2023 soon -- Mon 1 May 11 CAT!.
0
4
0
@ianccovert
Ian Covert
2 years
RT @chrislin97: We have an upcoming paper at ICLR 2023 on a new feature attribution method for explaining representations learned by unsupe….
0
5
0
@ianccovert
Ian Covert
2 years
RT @pandeyparul: Check out this course on #XAI by @suinleelab. & @ianccovert. Very practical and nicely curated. Also points to some great….
0
3
0
@ianccovert
Ian Covert
2 years
This was joint work with the fantastic @ChanwooKim_ and @suinleelab from @uwcse/@uw_wail. And big shoutout to collaborators @SudarshanMukund @neiljethani + Rajesh from the paper this work builds on (9/9).
0
1
2
@ianccovert
Ian Covert
2 years
Our experiments used three image datasets and models as big as ViT-Large (arXiv needs to be updated), but there's still plenty of room to scale this up. My guess is that 1) it gets better with more data, 2) it can help learn better representations than the original task (8/n).
1
0
2
@ianccovert
Ian Covert
2 years
We compared to many baselines (attention-/gradient-based methods) on many metrics, and SVs almost always performed best. For example, removing high-scoring patches makes the pred drop quickly, and inserting high-scoring patches makes the pred rise quickly (7/n)
Tweet media one
1
0
2
@ianccovert
Ian Covert
2 years
There are some details about the right way of removing patches (we use attention masking) and how to effectively train the ViT explainer (we fine-tuned an existing model). But the final approach is relatively simple, and it works (6/n).
1
0
1
@ianccovert
Ian Covert
2 years
In a previous paper, we showed that SVs have a variational characterization that can be used as an objective function + trained with SGD. We took that idea a bit further here, proving that the loss upper bounds distance to true SVs due to strong convexity (5/n)
Tweet media one
1
0
2
@ianccovert
Ian Covert
2 years
The issue is how to do this efficiently, because the number of patch subsets is exponential in the number of patches. So how do we handle large models like ViTs? Our solution: amortized optimization (4/n).
1
0
2
@ianccovert
Ian Covert
2 years
The question we’re trying to answer is *which patches influence the prediction.* And Shapley values are a surprisingly simple approach: they're like leave-one-out, but the effect of removing a patch is averaged across all sets of preceding patches (3/n).
1
0
3
@ianccovert
Ian Covert
2 years
TL;DR: we fine-tune a ViT to directly predict its Shapley values, without using a dataset of ground truth examples. And it works quite well (2/n)
Tweet media one
1
0
8