Matthew Kowal
@MatthewKowal9
Followers
478
Following
6K
Media
42
Statuses
799
Researcher @FARAIResearch / Previously PhD @YorkUniversity @VectorInst / Intern @UbisoftLaForge @ToyotaResearch @_NextAI / Interpretability + AI Safety
Toronto, Canada
Joined March 2019
ICYMI highlights from our work last quarter!
This quarter, we red-teamed GPT-5, disclosed critical persuasion vulnerabilities to frontier labs (resulting in patches!), and co-organized AI Safety Connect at UNGA. Join us Dec 1-2 for San Diego Alignment Workshop. Plus, we're expanding 2x & hiring! 👇
0
1
7
I’m recruiting PhD students for 2026! If you are interested in robustness, training dynamics, interpretability for scientific understanding, or the science of LLM analysis you should apply. BU is building a huge LLM analysis/interp group and you’ll be joining at the ground floor.
Life update: I'm starting as faculty at Boston University in 2026! BU has SCHEMES for LM interpretability & analysis, so I couldn't be more pumped to join a burgeoning supergroup w/ @najoungkim @amuuueller. Looking for my first students, so apply and reach out!
18
125
663
🕳️🐇Into the Rabbit Hull – Part I (Part II tomorrow) An interpretability deep dive into DINOv2, one of vision’s most important foundation models. And today is Part I, buckle up, we're exploring some of its most charming features.
10
119
639
🕳️🐇Into the Rabbit Hull – Part II Continuing our interpretation of DINOv2, the second part of our study concerns the geometry of concepts and the synthesis of our findings toward a new representational phenomenology: the Minkowski Representation Hypothesis
5
67
380
The takeaway for me: LLMs separate their token processing from their conceptual processing. Akin to humans' dual route processing of speech. We need to be aware when an LM is thinking about tokens or concepts. They do both, and it makes a difference which way it's thinking.
1
1
12
Communicating with video models: our new work showing emergent behaviors of Veo3 able to do A LOT of tasks that it wasn't trained to do! Video models are now entering the stage where LLM models were a few years back - emergent behaviors, being able to capture what humans want to
video-zero-shot.github.io
Video models like Veo 3 are on a path to become vision foundation models.
Could video models be the path to general visual intelligence? In our new paper, we show that Veo3 has emergent zero-shot capabilities, solving complex tasks across the vision stack. Project page: https://t.co/WwVuZ5P9Y6 Paper: https://t.co/pHIX8uDpaH 🧵👇🏻
2
16
92
This is going to revolutionize education 📚 Google just launched "Learn Your Way" that basically takes whatever boring chapter you're supposed to read and rebuilds it around stuff you actually give a damn about. Like if you're into basketball and have to learn Newton's laws,
187
2K
9K
Are there conceptual directions in VLMs that transcend modality? Check out our COLM spotlight🔦 paper! We analyze how linear concepts interact with multimodality in VLM embeddings using SAEs with @Huangyu58589918, @napoolar, @ShamKakade6 and Stephanie Gil https://t.co/4d9yDIeePd
10
87
510
This was a really fun project to work on - and huge shoutouts to my amazing collaborators who made the project such a delight!! 🎉💪
1/ Many frontier AIs are willing to persuade on dangerous topics, according to our new benchmark: Attempt to Persuade Eval (APE). Here’s Google’s most capable model, Gemini 2.5 Pro trying to convince a user to join a terrorist group👇
1
1
6
I’m thrilled to be joining @cohere in the role of Chief AI Officer, helping advance cutting-edge research and product development. Cohere has an incredible team and mission. Exciting new chapter for me!
We’re excited to announce $500M in new funding to accelerate our global expansion and build the next generation of enterprise AI technology! We are also welcoming two additions to our leadership team: Joelle Pineau as Chief AI Officer and Francois Chadwick as Chief Financial
124
71
2K
Are you afraid of LLMs teaching people how to build bioweapons? Have you tried just... not teaching LLMs about bioweapons? @AIEleuther and @AISecurityInst joined forces to see what would happen, pretraining three 6.9B models for 500B tokens and producing 15 total models to study
28
74
565
Thrilled to welcome @EkdeepL to the team! Ekdeep is working on a new research agenda on “cognitive interpretability”, aimed at adapting and improving theories of human cognition to design tools for explaining model cognition.
3
7
168
nn layers align their singular vectors each matrix syncs to its neighbor, its rotation neatly clicking into the basis directions of the next rotation. like two gears precision-machined to be partners LLMs are swiss watches, ticking in a billion-dimensional pocket universe
5
23
321
We wrote this paper after an ICLR reviewer claimed that everyone knows global pooling removes all spatial information. They used that argument to reject a submission on a completely different topic. Thanks Reviewer 2, yes we mean it 😉
Meet Amirul @amirul0507 and Matt @MatthewKowal9 the #ComputerVision MythBusters. Myth: "A global pooling layer removes spatial position information." Drop by our @ICCV_2021 #ICCV2021 poster to see this myth BUSTED! Session 1B: Thursday 5 PM EDT @YorkUniversity @LassondeSchool
2
4
72
We worked with @OpenAI to test GPT-5 and improve its safeguards. We applaud OpenAI's free sharing of 3rd-party testing and responsiveness to feedback. However, our testing uncovered key limitations with the safeguards and threat modeling, which we hope OpenAI will soon resolve.
1
13
46
🧑🍳🍴On the concept menu for tonight: You have a choice of main course between 4413 (🍝) or 4538 (🍕), paired with 2587 (🍷), followed by a delicious dessert choice between 4183 (🍨) or 4893 (🍰)
🌌🛰️🔭Want to explore universal visual features? Check out our interactive demo of concepts learned from our #ICML2025 paper "Universal Sparse Autoencoders: Interpretable Cross-Model Concept Alignment". Come see our poster at 4pm on Tuesday in East Exhibition hall A-B, E-1208!
0
4
14
Couldn’t be more excited to share our latest paper — accepted to ICML 2025 @icmlconf — with JP Morgan AI Research. It explores a simple question: To safely and effectively mitigate errors post-training, when (and how much) should we steer large language models? 🧵
1
4
12
1/8: The Emergent Misalignment paper showed LLMs trained on insecure code then want to enslave humanity...?! We're releasing two papers exploring why! We: - Open source small clean EM models - Show EM is driven by a single evil vector - Show EM has a mechanistic phase transition
15
49
261
*Universal Sparse Autoencoders* by @HThasarathan @Napoolar @MatthewKowal9 @CSProfKGD They train a shared SAE latent space on several vision encoders at once, showing, e.g., how the same concept activates in different models. https://t.co/pOnnT2WceS
3
41
255
🚨 New paper alert! Linear representation hypothesis (LRH) argues concepts are encoded as **sparse sum of orthogonal directions**, motivating interpretability tools like SAEs. But what if some concepts don’t fit that mold? Would SAEs capture them? 🤔 1/11
5
62
387