Explainable AI
@XAI_Research
Followers
2K
Following
616
Media
14
Statuses
1K
Moved to π¦! Explainable/Interpretable AI researchers and enthusiasts - DM to join the XAI Slack! Twitter and Slack maintained by @NickKroeger1
Joined March 2022
There's a new XAI Slack! Connect with XAI/IML researchers and enthusiasts from around the world. Discuss interpretability methods, get help on challenging problems, and meet experts in your field! DM to join π₯³
21
11
48
Our Theory of Interpretable AI ( https://t.co/e9914pRv7y) will soon celebrate its one-year anniversary! π₯³ As we step into our second year, weβd love to hear from you! What papers would you like to see discussed in our seminar in the future? ππ @tverven @ML_Theorist
tverven.github.io
The Theory of Interpretable AI Seminar is an international online seminar about the theoretical foundations of interpretable and explainable AI.
1
4
17
π¨ Excited to share: "Learning to Generate Unit Tests for Automated Debugging" π¨ which introduces β¨UTGen and UTDebugβ¨ for teaching LLMs to generate unit tests (UTs) and debugging code from generated tests. UTGen+UTDebug improve LLM-based code debugging by addressing 3 key
5
62
166
Super excited to share our latest preprint that unifies multiple areas within explainable AI that have been evolving somewhat independently: 1. Feature Attribution 2. Data Attribution 3. Model Component Attribution (aka Mechanistic Interpretability) https://t.co/Sr9gvMDxoG
2
18
137
Reminder we have moved to π¦ Stay up to date with the latest XAI research!
1
1
12
Hot off the press: my PhD thesis "Foundations of machine learning interpretability" is officially published! Enjoy it at https://t.co/tRiWkTp3MV
@XAI_Research @trustworthy_ml
theses.hal.science
The rising use of complex Machine Learning (ML) models, especially in critical applications, has highlighted the urgent need for interpretability methods. Despite the variety of solutions proposed to...
1
3
17
Reminder we have moved to π¦ Stay up to date with the latest XAI research!
1
1
12
We have moved to π¦ bluesky! Please follow over there @ XAI-Research https://t.co/PEc4tAu96R
0
2
3
Exciting opportunity at the intersection of climate science and XAI to work on groundbreaking research in attributing extreme precipitation events with multimodal models. Check out the details and help spread the word! #ClimateAI #Postdoc #UVA #Hiring Job description:
Dear Climate and AI community! We are hiring π a postdoc to join @UVAEnvironment at @UVA and work with @_cagarwal and myself, on using multimodal AI models and explainable AI to attribute extreme precipitation events! Fascinating stuff! Link below. Please RT!
0
7
17
We have moved to π¦ bluesky! Please follow over there @ XAI-Research https://t.co/PEc4tAu96R
0
2
3
π Curious about what's really happening inside vision models? Join us at the First Workshop on Mechanistic Interpretability for Vision (MIV) at @CVPR! π’ Website: https://t.co/Ynpv1osH0t Meet our amazing invited speakers! #CVPR2025 #MIV25 #MechInterp #ComputerVision
0
13
58
The later features in DINO-v2 are more abstract and semantically meaningful than I'd expected from the training objectives. This neuron responds only to hugs. Nothing else, just hugs.
9
64
556
This week's Apart News brings you an *exclusive* interview with interpretability insider @myra_deng of @GoodfireAI & revisits our Sparse Autoencoders Hackathon which featured a memorable talk from @GoogleDeepMind's @NeelNanda5.
1
4
18
@dylanjsam Hi Dylan, it reminds me of our paper where we also train a model (model 2) on the output of another black-box model (model 1). ultimately we find that combining the outputs of model 2 and model 1 helps improve the perf significantly. https://t.co/QY7XPpCMM0
openreview.net
Nearest neighbors (NN) are traditionally used to compute final decisions, e.g., in Support Vector Machines or k-NN classifiers, and to provide users with explanations for the model's decision. In...
0
1
3
In case you missed it: here is the recording of @YishayMansour's talk about the ability of decision trees to approximate concepts: https://t.co/DERjJawP7R For upcoming talks, check out the seminar website:
tverven.github.io
The Theory of Interpretable AI Seminar is an international online seminar about the theoretical foundations of interpretable and explainable AI.
0
3
16
LLMs are all circuits and patterns Nice Paper for a long weekend read - "A Primer on the Inner Workings of Transformer-based Language Models" π Provides a concise intro focusing on the generative decoder-only architecture. π Introduces the Transformer layer components,
4
49
261
@_cagarwal Follow us for AI safety insights https://t.co/hfntPPogY0 And watch the full video
0
5
17
This Thursday (in 3 days), @YishayMansour will discuss interpretable approximations β learning with interpretable models. Is it the same as regular learning? Attend the lecture to find out! π» Website: https://t.co/MPJzLcxNfI
@Suuraj @tverven
tverven.github.io
The Theory of Interpretable AI Seminar is an international online seminar about the theoretical foundations of interpretable and explainable AI.
The theory of interpretable AI seminar is back after the holiday season! π
π€Ά Our next talk is next Thursday by Yishay Mansour who will talk about interpretable approximations π» Website: https://t.co/MPJzLcxNfI β°Date: 16 Jan @Suuraj @tverven @YishayMansour
0
3
17
We're open-sourcing Sparse Autoencoders (SAEs) for Llama 3.3 70B and Llama 3.1 8B! These are, to the best of our knowledge, the first open-source SAEs for models at this scale and capability level.
11
120
712
What can AI researchers do *today* that AI developers will find useful for ensuring the safety of future advanced AI systems? To ring in the new year, the Anthropic Alignment Science team is sharing some thoughts on research directions we think are important.
10
66
325
ACL Time @ Bangkok πΉπ Our GNNavi work will be presented in the poster session at 12:30 on Aug. 14 (Wed.). Welcome to drop by and exchange with us! Looking forward to talking with people, especially those who are interested in multilingual & low-resource & LLM interpretabilityπ€
0
7
29