Sophia Sirko-Galouchenko @sophia_sirko X Profile

Sophia Sirko-Galouchenko

@sophia_sirko

Followers

62

Following

28

Media

5

Statuses

14

PhD student in visual representation learning at https://t.co/WkgoTYyOly and Sorbonne Université (MLIA)

Paris, France

Joined April 2015

Don't wanna be here? Send us removal request.

Sophia Sirko-Galouchenko

@sophia_sirko

4 months

1/n 🚀New paper out - accepted at @ICCVConference! Introducing DIP: unsupervised post-training that enhances dense features in pretrained ViTs for dense in-context scene understanding Below: Low-shot in-context semantic segmentation examples. DIP features outperform DINOv2!

2

26

120

valeo.ai

@valeoai

11 days

The PhD graduation season in the team goes on! Today Corentin Sautier is defending his PhD on "Learning Actionable LiDAR Representations without Annotations". Good luck! 🚀

tetianka

@mtmthh

11 days

Another great event for @valeoai: a PhD defense of Corentin Sautier. His thesis «Learning Actionable LiDAR Representations w/o Annotations» covers the papers BEVContrast (learning self-sup LiDAR features), SLidR, ScaLR (distillation), UNIT and Alpine (solving tasks w/o labels).

2

15

valeo.ai

@valeoai

12 days

It’s PhD graduation season in the team! Today, @Bjoern_Michele is defending his PhD on "Domain Adaptation for 3D Data" Best of luck! 🚀

1

5

20

Shashank

@shawshank_v

3 months

Can open-data models beat DINOv2? Today we release Franca, a fully open-sourced vision foundation model. Franca with ViT-G backbone matches (and often beats) proprietary models like SigLIPv2, CLIP, DINOv2 on various benchmarks setting a new standard for open-source research🧵

13

57

274

Andrei Bursuc

@abursuc

4 months

1/ New & old work on self-supervised representation learning (SSL) with ViTs: MOCA ☕ - Predicting Masked Online Codebook Assignments w/ @SpyrosGidaris @oriane_simeoni @AVobecky @quobbe N. Komodakis, P. Pérez #TMLR #ICLR2025 Grab a ☕ and brace for a story & a 🧵

1

14

48

Shashank

@shawshank_v

4 months

New paper out - accepted at @ICCVConference We introduce MoSiC, a self-supervised learning framework that learns temporally consistent representations from video using motion cues. Key idea: leverage long-range point tracks to enforce dense feature coherence across time.🧵

2

24

129

Sophia Sirko-Galouchenko

@sophia_sirko

4 months

Work done in collaboration with @SpyrosGidaris @AVobecky @abursuc @thomenicolas1 Paper: https://t.co/5JYX9NuWrd Github: https://t.co/irZI8BMYF4 #ICCV2025

github.com

Official implementation of DIP: Unsupervised Dense In-Context Post-training of Visual Representations - sirkosophia/DIP

0

1

9

Sophia Sirko-Galouchenko

@sophia_sirko

4 months

6/n Benefits 💪 - < 9h on a single A100 gpu. - Improves across 6 segmentation benchmarks - Boosts performance for in-context depth prediction. - Plug-and-play for different ViTs: DINOv2, CLIP, MAE. - Robust in low-shot and domain shift.

1

0

6

Sophia Sirko-Galouchenko

@sophia_sirko

4 months

5/n Why is DIP unsupervised? DIP doesn't require manually annotated segmentation masks for its post-training. To accomplish this, it leverages Stable Diffusion (via DiffCut) alongside DINOv2R features to automatically construct in-context pseudo-tasks for its post-training.

1

4

Sophia Sirko-Galouchenko

@sophia_sirko

4 months

4/n Meet Dense In-context Post-training (DIP)! 🔄 - Meta-learning inspired: adopts episodic training principles. - Task-aligned: Explicitly mimics downstream dense in-context tasks during post-training. - Purpose-built: Optimizes the model for dense in-context performance.

1

0

5

Sophia Sirko-Galouchenko

@sophia_sirko

4 months

3/n Most unsupervised (post-)training methods for dense in-context scene understanding rely on self-distillation frameworks with (somewhat) complicated objectives and network components. Hard to interpret, tricky to tune. Is there a simpler alternative? 👀

1

0

5

Sophia Sirko-Galouchenko

@sophia_sirko

4 months

2/n What is dense in-context scene understanding? Formulate dense prediction tasks as nearest-neighbor retrieval problems using patch feature similarities between query and the labeled prompt images (introduced in @ibalazevic et al.’s HummingBird; figure below from their work).

1

0

5

Hugo

@Mldhug

1 year

You want to give audio abilities to your VLM without compromising its vision performance? You want to align your audio encoder with a pretrained image encoder without suffering from the modality gap? Check our #NeurIPS2024 paper with @michelolzam @Steph_lat and Slim Essid

1

3

19

Hugo

@Mldhug

2 years

The preprint of our work (with @salah_zaiem and @AlgayresR) on sample dependent ASR model selection is available on arXiv! In this paper we propose to train a decision module, that allows, given an audio sample, to use the smallest sufficient model leading to a good transcription

1

4

11