Christian Wolf (🦋🦋🦋) @chriswolfvision X Profile

Christian Wolf (🦋🦋🦋)

@chriswolfvision

Followers

8K

Following

10K

Media

962

Statuses

6K

Principal Scientist, @NaverLabsEurope, Lead of Spatial AI team. AI for Robotics. Feedback: https://t.co/uD0Z0OSHEX

https://t.co/YWW9uOWaUA

Lyon, France

Joined November 2015

Don't wanna be here? Send us removal request.

Christian Wolf (🦋🦋🦋)

@chriswolfvision

7 days

A new model for human mesh recovery, high-performing and w/o using any 3D scans, has been published by my excellent colleagues at @naverlabseurope . Excellent work!

Romain Brégier

@romain_bregier

7 days

Meet Anny, our Free (Apache 2.0) and Interpretable Human Body Model for all ages. Anny is built upon #MakeHuman and enables achieving SOTA in Human Mesh Recovery. ArXiv: https://t.co/vUrpD9tyBJ Demo: https://t.co/WpigEQJhUD Code: https://t.co/zUL0AqBt1d @naverlabseurope

4

62

423

Christian Wolf (🦋🦋🦋)

@chriswolfvision

7 days

"Sliding is all you need" (aka "What really matters in image goal navigation") has been accepted to 3DV 2026 (@3DVconf ) as an Oral presentation! By Gianluca Monaci, @WeinzaepfelP and myself. @naverlabseurope

Christian Wolf (🦋🦋🦋)

@chriswolfvision

4 months

In a new paper led by Gianluca Monaci, with @WeinzaepfelP and myself, we explore the relationship between rel pose estimation and image goal navigation and study diff. architectures: late fusion, channel cat, space2depth and cross-attention. https://t.co/9HZNcoaxtX 🧵1/5

1

13

58

Christian Wolf (🦋🦋🦋)

@chriswolfvision

20 days

Kinaema is trained with pose supervision and optionally with masked image modelling. Interestingly, even from training with relative pose only, we can show through probing experiments that occupancy maps are encoded in the maintained memory. 9/9

0

2

Christian Wolf (🦋🦋🦋)

@chriswolfvision

20 days

Kinaema memory is composed of a set of multiple embeddings, and the attention of the same scene patch to embeddings seems to follow a stable pattern. 8/9

1

3

Christian Wolf (🦋🦋🦋)

@chriswolfvision

20 days

We outperform classical recurrent sequence models including other recurrent transformers. We train on sequences of length T=100 and show generalization up to T=800 and T=1000, which we think is yet unheard of. 7/9

1

2

Christian Wolf (🦋🦋🦋)

@chriswolfvision

20 days

On a very high-level, the difference between classical rel pose and Kinaema is: - rel pose estimation compares the camera poses of 2 images - Kinaema estimates the rel pose between an image and agent memory, where memory holds the scene and the current agent position. 6/9

1

2

Christian Wolf (🦋🦋🦋)

@chriswolfvision

20 days

The model and memory are integrated into our SoTA ImageNav agent DEBiT (ICLR 2024, https://t.co/tCtfqHcmur). The augmented agent gets directional information from two sources: - comparing obs and goal with DEBiT - comparing memory with goal with the new Kinaema model 5/9

1

2

Christian Wolf (🦋🦋🦋)

@chriswolfvision

20 days

Our goals were (G1) Recurrence, be fast and robotics friendly - O(1) updates! (G2) Memory capacity - scale memory without effect on network capacity, in contrast to LSTMs, GRUs, Mamba etc. (G3) Stability - integrate gating into recurrent transformers. 4/9

1

3

Christian Wolf (🦋🦋🦋)

@chriswolfvision

20 days

Kinaema memory - is recurrent, ie. is updated in O(1) compared to O(N^2) of classical transformers or O(N) of causal transformers, - uses a transformer to update its recurrent memory, - does not have a limited context length, - does not need to store observations. 3/9

1

4

Christian Wolf (🦋🦋🦋)

@chriswolfvision

20 days

Kinaema is capable of integrating visual observations while moving in a potentially large scene and maintains latent memory of the scene. Upon request, it takes a query image and predicts the rel. position of the shown space with respect to its current position. 2/9

1

0

2

Christian Wolf (🦋🦋🦋)

@chriswolfvision

20 days

We have a new sequence model for robotics, which will be presented at #NeurIPS2025: Kinaema: A recurrent sequence model for memory and pose in motion https://t.co/4ytorM2JqW By @mbsariyildiz, @WeinzaepfelP, @_WGW101, G. Monaci and myself. @naverlabseurope 1/9

5

45

225

Christian Wolf (🦋🦋🦋)

@chriswolfvision

21 days

With retrieval, we further improve the SoTA on ImageNav, and allow for zero shot use cases. This is work by G. Monaci, R.S. Rezende, @RDeffayet, @kgcs96, @_WGW101, H. Déjean, @sclincha and myself.

0

1

Christian Wolf (🦋🦋🦋)

@chriswolfvision

21 days

This work heavily builds on the geometric foundation model of our SoTA DEBiT agent: ICLR 2024 https://t.co/warJFQ3g7b In RANa we leverage this Geometric FM for navigation, but also to process the retrieval context and extract directional information:

1

Christian Wolf (🦋🦋🦋)

@chriswolfvision

21 days

Our paper "RANa: Retrieval-Augmented Navigation" has been accepted at TMLR! We retrieve waypoint images from a DB and leverage the ImageNav capabilities from our SOTA agent DEBit to navigate to these waypoints, improving performance. https://t.co/BXLUh0VaVh @naverlabseurope

1

6

31

Christian Wolf (🦋🦋🦋)

@chriswolfvision

24 days

We target MSc interns, and if the internship is successful can propose a PhD position in Fall 2026.

0

Christian Wolf (🦋🦋🦋)

@chriswolfvision

24 days

We have a new internship position open in our team, on AI for robotics: manipulation using 3D foundation models. @naverlabseurope With Sorbonne University / ISIR (Nicolas Thome) You can apply online: https://t.co/poiATq0mEI

1

7

18

Christian Wolf (🦋🦋🦋)

@chriswolfvision

4 months

Naver Labs Europe organizes a Workshop on AI for Robotics in the French Alpes (Grenoble), the 4th edition. This year the topic is 'Spatial AI', registration is open!

NAVER LABS Europe

@naverlabseurope

4 months

Major announcement ✨registration is OPEN✨ AI for Robotics workshop (4th edition) 🗓️Nov 21-22 Grenoble, France! https://t.co/XivExHLzO4 Confirmed speakers: Andrew Davison - Nicolas Mansard - @CordeliaSchmid - Marc Pollefeys - Michael Gienger - David Novotny - Andrea Vedaldi -

1

2

20

Christian Wolf (🦋🦋🦋)

@chriswolfvision

4 months

References: This new paper and study: https://t.co/9HZNcoaxtX Binocular encoders for RPE and image-goal navigation: (ICLR 2024) https://t.co/tCtfqHcUjZ 5/5

0

2

Christian Wolf (🦋🦋🦋)

@chriswolfvision

4 months

We show that nav and RPE performance can be partially transferred from simple to realistic sim, but cannot be fully recovered. Binocular ViTs with cross-attention + pre-training for rel pose estimation (RPE) remain the most efficient solution for image goal navigation. 4/5

1

0

2

Christian Wolf (🦋🦋🦋)

@chriswolfvision

4 months

We show that nav training only works when the sim setting is easy (sliding across walls is ok). While the impact of this on sim2real transfer was known, its impact on image goal nav is surprising: the setting is related to physics, whereas RPE is a computer vision problem. 3/5

1

0

1