Thomas Fel @Napoolar X Profile

Thomas Fel

@Napoolar

Followers

2K

Following

10K

Media

93

Statuses

1K

Explainability, Computer Vision, Neuro-AI @Harvard. Research Fellow @KempnerInst. Prev. @tserre lab, @Google, @GoPro. Crêpe lover.

https://t.co/frYpgBR9fd

Boston, MA

Joined February 2017

Don't wanna be here? Send us removal request.

Thomas Fel

@Napoolar

26 days

🕳️🐇Into the Rabbit Hull – Part I (Part II tomorrow) An interpretability deep dive into DINOv2, one of vision’s most important foundation models. And today is Part I, buckle up, we're exploring some of its most charming features.

10

119

639

Sasha Boguraev @ EMNLP

@SashaBoguraev

6 months

A key hypothesis in the history of linguistics is that different constructions share underlying structure. We take advantage of recent advances in mechanistic interpretability to test this hypothesis in Language Models. New work with @kmahowald and @ChrisGPotts! 🧵👇

2

27

97

Goodfire

@GoodfireAI

3 days

LLMs memorize a lot of training data, but memorization is poorly understood. Where does it live inside models? How is it stored? How much is it involved in different tasks? @jack_merullo_ & @srihita_raju's new paper examines all of these questions using loss curvature! (1/7)

10

126

776

Tamar Rott Shaham

@TamarRottShaham

4 days

A key challenge for interpretability agents is knowing when they’ve understood enough to stop experimenting. Our @NeurIPSConf paper introduces a self-reflective agent that measures the reliability of its own explanations and stops once its understanding of models has converged.

2

27

46

Shauli Ravfogel

@ravfogel

17 days

New NeurIPS paper! 🐣Why do LMs represent concepts linearly? We focus on LMs's tendency to linearly separate true and false assertions, and provide a complete analysis of the truth circuit in a toy model. A joint work with @Giladude, @tallinzen, Joan Bruna and @albertobietti.

7

49

479

Anirudha Majumdar

@Majumdar_Ani

18 days

👁️ Geometry Meets Vision: Revisiting Pretrained Semantics in Distilled Fields We find that visual-only features (DINO) outperform visual-geometry features (VGGT) in spatial tasks! 👇

7

31

244

Wes Gurnee

@wesg52

19 days

New paper! We reverse engineered the mechanisms underlying Claude Haiku’s ability to perform a simple “perceptual” task. We discover beautiful feature families and manifolds, clean geometric transformations, and distributed attention algorithms!

45

314

2K

Sham Kakade

@ShamKakade6

18 days

(1/9) Diagonal preconditioners such as Adam typically use empirical gradient information rather than true second-order curvature. Is this merely a computational compromise or can it be advantageous? Our work confirms the latter: Adam can outperform Gauss-Newton in certain cases.

2

18

129

Mazda Moayeri

@MLMazda

20 days

I'll be speaking at this #ICCV2025 WS today at 4:30pm, room 308B. Fav part of prepping was talking to my friends *outside* of AI, who (turns out) don't trust these systems nor those who build them. Come by to hear how my perspective on trust has transformed over the past year.

Shirley Wu

@ShirleyYXWu

6 months

Can we ever truly trust foundation models—and if so, how? Our ICCV TrustFM workshop ( https://t.co/slBvSt9uUZ) is now accepting submissions (deadline: 8/1, attending: 10/19-10/23, Hawai'i) Submit, attend, and learn from everyone around the world who is making FMs more

0

3

17

Michael C. Mozer

@mc_mozer

23 days

[1/4] As you read words in this text, your brain adjusts fixation durations to facilitate comprehension. Inspired by human reading behavior, we propose a supervised objective that trains an LLM to dynamically determine the number of compute steps for each input token.

4

10

25

Naomi Saphra

@nsaphra

24 days

I’m recruiting PhD students for 2026! If you are interested in robustness, training dynamics, interpretability for scientific understanding, or the science of LLM analysis you should apply. BU is building a huge LLM analysis/interp group and you’ll be joining at the ground floor.

Naomi Saphra

@nsaphra

8 months

Life update: I'm starting as faculty at Boston University in 2026! BU has SCHEMES for LM interpretability & analysis, so I couldn't be more pumped to join a burgeoning supergroup w/ @najoungkim @amuuueller. Looking for my first students, so apply and reach out!

18

126

667

Adrià Garriga-Alonso

@AdriGarriga

24 days

Potentially field-defining work. I just know there will be follow-ups.

Thomas Fel

@Napoolar

25 days

🕳️🐇Into the Rabbit Hull – Part II Continuing our interpretation of DINOv2, the second part of our study concerns the geometry of concepts and the synthesis of our findings toward a new representational phenomenology: the Minkowski Representation Hypothesis

1

2

52

Thomas Fel

@Napoolar

24 days

Guess the rabbit hull goes deeper than expected haha🐰🎉 Thanks @ylecun !

2

83

Remi Cadene

@RemiCadene

25 days

I am starting a venture on top of LeRobot! We’re at a pivotal time. AI is moving beyond digital to the physical world. Embodied AI will change our surroundings in ways we can barely imagine. This technology holds the potential to empower everyone. It must not be controlled by

96

87

740

Thomas Fel

@Napoolar

25 days

That concludes this two-part descent into the Rabbit Hull. Huge thanks to all collaborators who made this work possible — and again especially to @WangBinxu with whom this project was built, experiment after experiment. 🎮 https://t.co/eKZJFatw4k 📄

0

1

19

Thomas Fel

@Napoolar

25 days

If this holds, three implications: (i) Concepts = points (or regions), not directions (ii) Probing is bounded: toward archetypes, not vectors (iii) Can't recover generating hulls from sum: we should look deeper than just a single-layer activations to recover the true latents

1

3

19

Thomas Fel

@Napoolar

25 days

Synthesizing these observations, we propose a refined view, motivated by Gärdenfors' theory and attention geometry. Activations = multiple convex hulls simultaneously: a rabbit among animals, brown among colors, fluffy among textures. The Minkowski Representation Hypothesis.

1

2

21

Thomas Fel

@Napoolar

25 days

Taken together, the signs of partial density, local connectedness, and coherent dictionary atoms indicate that DINO’s representations are organized beyond linear sparsity alone.

1

9

Thomas Fel

@Napoolar

25 days

Can position explain this ? We found that pos. information collapses: from high-rank to a near 2-dim sheet. Early layers encode precise location; later ones retain abstract axes. This compression frees dimensions for features, and *position doesn't explain PCA map smoothness*

1

13

Thomas Fel

@Napoolar

25 days

Patch embeddings form smooth, connected surfaces tracing objects and boundaries -- even after removing positional components. This may suggests interpolative geometry: tokens as mixtures between landmarks, shaped by clustering and spreading forces in the training objectives.

1

12

Thomas Fel

@Napoolar

25 days

We found antipodal feature pairs (dᵢ ≈ − dⱼ): vertical vs horizontal lines, white vs black shirts, left vs right… Also, co-activation statistics only moderately shape geometry: concepts that fire together aren't necessarily nearby—nor orthogonal when they don't.

1

13