Max Seitzer Profile
Max Seitzer

@maxseitzer

Followers
599
Following
37
Media
35
Statuses
79

Researcher in the DINO team at Meta FAIR. Before: PhD at Max Planck Institute for Intelligent Systems, Tübingen. Representation learning, agents, structure.

Joined January 2021
Don't wanna be here? Send us removal request.
@maxseitzer
Max Seitzer
15 days
Introducing DINOv3 🦕🦕🦕. A SotA-enabling vision foundation model, trained with pure self-supervised learning (SSL) at scale. High quality dense features, combining unprecedented semantic and geometric scene understanding. Three reasons why this matters…
Tweet media one
12
139
1K
@maxseitzer
Max Seitzer
3 days
We have not gone much deeper than that, so I think there is more to find out about the role of those dimensions and how to potentially avoid them!.
1
0
2
@maxseitzer
Max Seitzer
3 days
4) These outliers are *different* from high norm tokens studied by the registers paper or by An et al in LLMs (, and are not removed by registers & attention bias. My guess is that they are specific to the DINO setup of different heads & losses.
Tweet card summary image
arxiv.org
Outliers have been widely observed in Large Language Models (LLMs), significantly impacting model performance and posing challenges for model compression. Understanding the functionality and...
1
0
3
@maxseitzer
Max Seitzer
3 days
3) Why? I think they enable mode switching in the final LN, selecting dims for the heads. Before LN, DINO CLS & iBOT mask tokens share many top dims. After LN, the top dims differ completely. The outlier dim is distributed differently for the 2 token types, likely enabling this.
2
0
3
@maxseitzer
Max Seitzer
3 days
2) We tried different ways to get rid of the outlier dimensions (L2 reg/Linf reg/topk masking), but they either re-emerged, or performance was negatively affected. So indeed they appear critical for the function of the model.
1
0
4
@maxseitzer
Max Seitzer
3 days
1) These dimensions carry no information in the sense that the channels can be zeroed in the pre-final LN features without drop in downstream metrics (eg linear probing on the ablated features). Of course, applying the final LN to the zeroed features changes the output statistics.
1
0
3
@maxseitzer
Max Seitzer
3 days
Nice investigation! We did study those outlier dimensions a bit for the 7B model, which is summarized in section A.2 of the paper. Some comments:.
@rgilman33
Rudy Gilman
4 days
DINO-v3 has a single high-magnitude channel on its residual pathway, channel 416. Turning off this single channel affects DINO's entire output by 50-80%. For context, turning off a random channel has an effect of less than one percent. The model builds up channel 416 in its last
1
0
14
@maxseitzer
Max Seitzer
14 days
3) Web ≥ satellite model. This is a surprising one! Our main web model outperforms the satellite model on some geospatial tasks 🤯Goes to show the power of massive datasets for generalization. Would be interesting to see if this holds for other domains as well, e.g. microscopy!
Tweet media one
1
1
1
@maxseitzer
Max Seitzer
14 days
2) Minimal performance loss for distilled models. We compress the big 7B model into more practical versions like the 840M H+ and 300M L — with minimal loss despite 8-23x reduction in params! The L especially shines on dense tasks relative to its size. Best of both worlds!
Tweet media one
Tweet media two
1
0
1
@maxseitzer
Max Seitzer
14 days
This is a result of high resolution adaptation (Sec 5.1)! Before, we saw performance dropping at higher resolutions for dense tasks. After, we get better results with higher resolutions, as it should be.
Tweet media one
1
0
1
@maxseitzer
Max Seitzer
14 days
1) Scaling to extreme resolutions. Even though the model is trained at max. 768px inputs, it can handle WAY more than that. Features don’t degrade, they become more crisp! Tested up to 4k. This is a property emerging for the larger models (≥L), see Fig 17.
Tweet media one
1
0
2
@maxseitzer
Max Seitzer
14 days
The DINOv3 paper is now available on arXiv: Have you looked at the paper yet?. Here are three observations that might not be immediately obvious from a first read 👇.
Tweet card summary image
arxiv.org
Self-supervised learning holds the promise of eliminating the need for manual data annotation, enabling models to scale effortlessly to massive datasets and larger architectures. By not being...
@maxseitzer
Max Seitzer
15 days
Introducing DINOv3 🦕🦕🦕. A SotA-enabling vision foundation model, trained with pure self-supervised learning (SSL) at scale. High quality dense features, combining unprecedented semantic and geometric scene understanding. Three reasons why this matters…
Tweet media one
1
0
9
@maxseitzer
Max Seitzer
15 days
RT @arkitus: This figure from the impressive DINOv3 paper is fun to think about. Pretend it's 2018 and you're deciding what research to foc….
0
16
0
@maxseitzer
Max Seitzer
15 days
RT @TimDarcet: hey we heard you liked dinov2 so we got you more of the same shit. dinov3 is like dinov2 in the sense that it's much better….
0
11
0
@maxseitzer
Max Seitzer
15 days
RT @MarcSzafraniec: Proud to have contributed to the ground-breaking DINOv3 by reaching the SOTA on COCO Object Detection, for the first ti….
0
7
0
@maxseitzer
Max Seitzer
15 days
RT @AIatMeta: Introducing DINOv3: a state-of-the-art computer vision model trained with self-supervised learning (SSL) that produces powerf….
0
757
0
@maxseitzer
Max Seitzer
15 days
RT @BaldassarreFe: Say hello to DINOv3 🦖🦖🦖. A major release that raises the bar of self-supervised vision foundation models. With stunning….
0
275
0
@maxseitzer
Max Seitzer
15 days
@TimDarcet, @TheoMoutakanni, Leonel Sentana, @_claireroberts, Andrea Vedaldi, Jamie Tolan, John Brandt, Camille Couprie, @julienmairal, @hjegou, @monsieurlabatut, @p_bojanowski. And of course, it’s open source! 📜 Paper:
Tweet card summary image
github.com
Reference PyTorch implementation and models for DINOv3 - facebookresearch/dinov3
1
3
63
@maxseitzer
Max Seitzer
15 days
Immensely proud to have been part of this project. Thank you to the team: @oriane_simeoni, @huyvvo, @BaldassarreFe, Maxime Oquab, Cijo Jose, Vasil Khalidov, @MarcSzafraniec, Seungeun Yi, @MichaelRamamon, @fvsmassa, @d_haziza, @LucaWehrstedt, @jianyuan_wang, …
Tweet media one
1
0
51
@maxseitzer
Max Seitzer
15 days
And here’s my favorite figure from the paper, showing high resolution DINOv3 representations in all their glory ✨
Tweet media one
3
4
71