Sophia Sirko-Galouchenko Profile
Sophia Sirko-Galouchenko

@sophia_sirko

Followers
62
Following
28
Media
5
Statuses
14

PhD student in visual representation learning at https://t.co/WkgoTYyOly and Sorbonne Université (MLIA)

Paris, France
Joined April 2015
Don't wanna be here? Send us removal request.
@sophia_sirko
Sophia Sirko-Galouchenko
4 months
1/n 🚀New paper out - accepted at @ICCVConference! Introducing DIP: unsupervised post-training that enhances dense features in pretrained ViTs for dense in-context scene understanding Below: Low-shot in-context semantic segmentation examples. DIP features outperform DINOv2!
2
26
120
@valeoai
valeo.ai
11 days
The PhD graduation season in the team goes on! Today Corentin Sautier is defending his PhD on "Learning Actionable LiDAR Representations without Annotations". Good luck! 🚀
@mtmthh
tetianka
11 days
Another great event for @valeoai: a PhD defense of Corentin Sautier. His thesis «Learning Actionable LiDAR Representations w/o Annotations» covers the papers BEVContrast (learning self-sup LiDAR features), SLidR, ScaLR (distillation), UNIT and Alpine (solving tasks w/o labels).
2
2
15
@valeoai
valeo.ai
12 days
It’s PhD graduation season in the team! Today, @Bjoern_Michele is defending his PhD on "Domain Adaptation for 3D Data" Best of luck! 🚀
1
5
20
@shawshank_v
Shashank
3 months
Can open-data models beat DINOv2? Today we release Franca, a fully open-sourced vision foundation model. Franca with ViT-G backbone matches (and often beats) proprietary models like SigLIPv2, CLIP, DINOv2 on various benchmarks setting a new standard for open-source research🧵
13
57
274
@abursuc
Andrei Bursuc
4 months
1/ New & old work on self-supervised representation learning (SSL) with ViTs: MOCA ☕ - Predicting Masked Online Codebook Assignments w/ @SpyrosGidaris @oriane_simeoni @AVobecky @quobbe N. Komodakis, P. Pérez #TMLR #ICLR2025 Grab a ☕ and brace for a story & a 🧵
1
14
48
@shawshank_v
Shashank
4 months
New paper out - accepted at @ICCVConference We introduce MoSiC, a self-supervised learning framework that learns temporally consistent representations from video using motion cues. Key idea: leverage long-range point tracks to enforce dense feature coherence across time.🧵
2
24
129
@sophia_sirko
Sophia Sirko-Galouchenko
4 months
6/n Benefits 💪 - < 9h on a single A100 gpu. - Improves across 6 segmentation benchmarks - Boosts performance for in-context depth prediction. - Plug-and-play for different ViTs: DINOv2, CLIP, MAE. - Robust in low-shot and domain shift.
1
0
6
@sophia_sirko
Sophia Sirko-Galouchenko
4 months
5/n Why is DIP unsupervised? DIP doesn't require manually annotated segmentation masks for its post-training. To accomplish this, it leverages Stable Diffusion (via DiffCut) alongside DINOv2R features to automatically construct in-context pseudo-tasks for its post-training.
1
1
4
@sophia_sirko
Sophia Sirko-Galouchenko
4 months
4/n Meet Dense In-context Post-training (DIP)! 🔄 - Meta-learning inspired: adopts episodic training principles. - Task-aligned: Explicitly mimics downstream dense in-context tasks during post-training. - Purpose-built: Optimizes the model for dense in-context performance.
1
0
5
@sophia_sirko
Sophia Sirko-Galouchenko
4 months
3/n Most unsupervised (post-)training methods for dense in-context scene understanding rely on self-distillation frameworks with (somewhat) complicated objectives and network components. Hard to interpret, tricky to tune. Is there a simpler alternative? 👀
1
0
5
@sophia_sirko
Sophia Sirko-Galouchenko
4 months
2/n What is dense in-context scene understanding? Formulate dense prediction tasks as nearest-neighbor retrieval problems using patch feature similarities between query and the labeled prompt images (introduced in @ibalazevic et al.’s HummingBird; figure below from their work).
1
0
5
@Mldhug
Hugo
1 year
You want to give audio abilities to your VLM without compromising its vision performance? You want to align your audio encoder with a pretrained image encoder without suffering from the modality gap? Check our #NeurIPS2024 paper with @michelolzam @Steph_lat and Slim Essid
1
3
19
@Mldhug
Hugo
2 years
The preprint of our work (with @salah_zaiem and @AlgayresR) on sample dependent ASR model selection is available on arXiv! In this paper we propose to train a decision module, that allows, given an audio sample, to use the smallest sufficient model leading to a good transcription
1
4
11