Matthew Kowal @MatthewKowal9 X Profile

Matthew Kowal

@MatthewKowal9

Followers

478

Following

6K

Media

42

Statuses

799

Researcher @FARAIResearch / Previously PhD @YorkUniversity @VectorInst / Intern @UbisoftLaForge @ToyotaResearch @_NextAI / Interpretability + AI Safety

https://t.co/bh2WBdEFvX

Toronto, Canada

Joined March 2019

Don't wanna be here? Send us removal request.

Adam Gleave

@ARGleave

6 days

ICYMI highlights from our work last quarter!

FAR.AI

@farairesearch

6 days

This quarter, we red-teamed GPT-5, disclosed critical persuasion vulnerabilities to frontier labs (resulting in patches!), and co-organized AI Safety Connect at UNGA. Join us Dec 1-2 for San Diego Alignment Workshop. Plus, we're expanding 2x & hiring! 👇

0

1

7

Naomi Saphra

@nsaphra

20 days

I’m recruiting PhD students for 2026! If you are interested in robustness, training dynamics, interpretability for scientific understanding, or the science of LLM analysis you should apply. BU is building a huge LLM analysis/interp group and you’ll be joining at the ground floor.

Naomi Saphra

@nsaphra

7 months

Life update: I'm starting as faculty at Boston University in 2026! BU has SCHEMES for LM interpretability & analysis, so I couldn't be more pumped to join a burgeoning supergroup w/ @najoungkim @amuuueller. Looking for my first students, so apply and reach out!

18

125

663

Thomas Fel

@Napoolar

22 days

🕳️🐇Into the Rabbit Hull – Part I (Part II tomorrow) An interpretability deep dive into DINOv2, one of vision’s most important foundation models. And today is Part I, buckle up, we're exploring some of its most charming features.

10

119

639

Thomas Fel

@Napoolar

21 days

🕳️🐇Into the Rabbit Hull – Part II Continuing our interpretation of DINOv2, the second part of our study concerns the geometry of concepts and the synthesis of our findings toward a new representational phenomenology: the Minkowski Representation Hypothesis

5

67

380

David Bau

@davidbau

1 month

The takeaway for me: LLMs separate their token processing from their conceptual processing. Akin to humans' dual route processing of speech. We need to be aware when an LM is thinking about tokens or concepts. They do both, and it makes a difference which way it's thinking.

1

12

Been Kim

@_beenkim

1 month

Communicating with video models: our new work showing emergent behaviors of Veo3 able to do A LOT of tasks that it wasn't trained to do! Video models are now entering the stage where LLM models were a few years back - emergent behaviors, being able to capture what humans want to

video-zero-shot.github.io

Video models like Veo 3 are on a path to become vision foundation models.

Priyank Jaini

@priyankjaini

1 month

Could video models be the path to general visual intelligence? In our new paper, we show that Veo3 has emergent zero-shot capabilities, solving complex tasks across the vision stack. Project page: https://t.co/WwVuZ5P9Y6 Paper: https://t.co/pHIX8uDpaH 🧵👇🏻

2

16

92

Alex Prompter

@alex_prompter

1 month

This is going to revolutionize education 📚 Google just launched "Learn Your Way" that basically takes whatever boring chapter you're supposed to read and rebuilds it around stuff you actually give a damn about. Like if you're into basketball and have to learn Newton's laws,

187

2K

9K

Isabel Papadimitriou

@isabelpapad

2 months

Are there conceptual directions in VLMs that transcend modality? Check out our COLM spotlight🔦 paper! We analyze how linear concepts interact with multimodality in VLM embeddings using SAEs with @Huangyu58589918, @napoolar, @ShamKakade6 and Stephanie Gil https://t.co/4d9yDIeePd

10

87

510

Matthew Kowal

@MatthewKowal9

3 months

This was a really fun project to work on - and huge shoutouts to my amazing collaborators who made the project such a delight!! 🎉💪

FAR.AI

@farairesearch

3 months

1/ Many frontier AIs are willing to persuade on dangerous topics, according to our new benchmark: Attempt to Persuade Eval (APE). Here’s Google’s most capable model, Gemini 2.5 Pro trying to convince a user to join a terrorist group👇

1

6

Joelle Pineau

@jpineau1

3 months

I’m thrilled to be joining @cohere in the role of Chief AI Officer, helping advance cutting-edge research and product development. Cohere has an incredible team and mission. Exciting new chapter for me!

cohere

@cohere

3 months

We’re excited to announce $500M in new funding to accelerate our global expansion and build the next generation of enterprise AI technology! We are also welcoming two additions to our leadership team: Joelle Pineau as Chief AI Officer and Francois Chadwick as Chief Financial

124

71

2K

Stella Biderman

@BlancheMinerva

3 months

Are you afraid of LLMs teaching people how to build bioweapons? Have you tried just... not teaching LLMs about bioweapons? @AIEleuther and @AISecurityInst joined forces to see what would happen, pretraining three 6.9B models for 500B tokens and producing 15 total models to study

28

74

565

Goodfire

@GoodfireAI

3 months

Thrilled to welcome @EkdeepL to the team! Ekdeep is working on a new research agenda on “cognitive interpretability”, aimed at adapting and improving theories of human cognition to design tools for explaining model cognition.

3

7

168

cider

@jeffreycider

2 years

nn layers align their singular vectors each matrix syncs to its neighbor, its rotation neatly clicking into the basis directions of the next rotation. like two gears precision-machined to be partners LLMs are swiss watches, ticking in a billion-dimensional pocket universe

5

23

321

Kosta Derpanis (sabbatical @ CMU)

@CSProfKGD

3 months

We wrote this paper after an ICLR reviewer claimed that everyone knows global pooling removes all spatial information. They used that argument to reject a submission on a completely different topic. Thanks Reviewer 2, yes we mean it 😉

Kosta Derpanis (sabbatical @ CMU)

@CSProfKGD

4 years

Meet Amirul @amirul0507 and Matt @MatthewKowal9 the #ComputerVision MythBusters. Myth: "A global pooling layer removes spatial position information." Drop by our @ICCV_2021 #ICCV2021 poster to see this myth BUSTED! Session 1B: Thursday 5 PM EDT @YorkUniversity @LassondeSchool

2

4

72

FAR.AI

@farairesearch

3 months

We worked with @OpenAI to test GPT-5 and improve its safeguards. We applaud OpenAI's free sharing of 3rd-party testing and responsiveness to feedback. However, our testing uncovered key limitations with the safeguards and threat modeling, which we hope OpenAI will soon resolve.

1

13

46

Matthew Kowal

@MatthewKowal9

4 months

🧑‍🍳🍴On the concept menu for tonight: You have a choice of main course between 4413 (🍝) or 4538 (🍕), paired with 2587 (🍷), followed by a delicious dessert choice between 4183 (🍨) or 4893 (🍰)

Harry Thasarathan

@HThasarathan

4 months

🌌🛰️🔭Want to explore universal visual features? Check out our interactive demo of concepts learned from our #ICML2025 paper "Universal Sparse Autoencoders: Interpretable Cross-Model Concept Alignment". Come see our poster at 4pm on Tuesday in East Exhibition hall A-B, E-1208!

0

4

14

Anna Hedström

@anna_hedstroem

5 months

Couldn’t be more excited to share our latest paper — accepted to ICML 2025 @icmlconf — with JP Morgan AI Research. It explores a simple question: To safely and effectively mitigate errors post-training, when (and how much) should we steer large language models? 🧵

1

4

12

Ed Turner

@EdTurner42

5 months

1/8: The Emergent Misalignment paper showed LLMs trained on insecure code then want to enslave humanity...?! We're releasing two papers exploring why! We: - Open source small clean EM models - Show EM is driven by a single evil vector - Show EM has a mechanistic phase transition

15

49

261

Simone Scardapane

@s_scardapane

5 months

*Universal Sparse Autoencoders* by @HThasarathan @Napoolar @MatthewKowal9 @CSProfKGD They train a shared SAE latent space on several vision encoders at once, showing, e.g., how the same concept activates in different models. https://t.co/pOnnT2WceS

3

41

255

Ekdeep Singh

@EkdeepL

5 months

🚨 New paper alert! Linear representation hypothesis (LRH) argues concepts are encoded as **sparse sum of orthogonal directions**, motivating interpretability tools like SAEs. But what if some concepts don’t fit that mold? Would SAEs capture them? 🤔 1/11

5

62

387