chriswolfvision Profile Banner
Christian Wolf (🦋🦋🦋) Profile
Christian Wolf (🦋🦋🦋)

@chriswolfvision

Followers
8K
Following
10K
Media
962
Statuses
6K

Principal Scientist, @NaverLabsEurope, Lead of Spatial AI team. AI for Robotics. Feedback: https://t.co/uD0Z0OSHEX

Lyon, France
Joined November 2015
Don't wanna be here? Send us removal request.
@chriswolfvision
Christian Wolf (🦋🦋🦋)
7 days
A new model for human mesh recovery, high-performing and w/o using any 3D scans, has been published by my excellent colleagues at @naverlabseurope . Excellent work!
@romain_bregier
Romain Brégier
7 days
Meet Anny, our Free (Apache 2.0) and Interpretable Human Body Model for all ages. Anny is built upon #MakeHuman and enables achieving SOTA in Human Mesh Recovery. ArXiv: https://t.co/vUrpD9tyBJ Demo: https://t.co/WpigEQJhUD Code: https://t.co/zUL0AqBt1d @naverlabseurope
4
62
423
@chriswolfvision
Christian Wolf (🦋🦋🦋)
7 days
"Sliding is all you need" (aka "What really matters in image goal navigation") has been accepted to 3DV 2026 (@3DVconf ) as an Oral presentation! By Gianluca Monaci, @WeinzaepfelP and myself. @naverlabseurope
@chriswolfvision
Christian Wolf (🦋🦋🦋)
4 months
In a new paper led by Gianluca Monaci, with @WeinzaepfelP and myself, we explore the relationship between rel pose estimation and image goal navigation and study diff. architectures: late fusion, channel cat, space2depth and cross-attention. https://t.co/9HZNcoaxtX 🧵1/5
1
13
58
@chriswolfvision
Christian Wolf (🦋🦋🦋)
20 days
Kinaema is trained with pose supervision and optionally with masked image modelling. Interestingly, even from training with relative pose only, we can show through probing experiments that occupancy maps are encoded in the maintained memory. 9/9
0
0
2
@chriswolfvision
Christian Wolf (🦋🦋🦋)
20 days
Kinaema memory is composed of a set of multiple embeddings, and the attention of the same scene patch to embeddings seems to follow a stable pattern. 8/9
1
1
3
@chriswolfvision
Christian Wolf (🦋🦋🦋)
20 days
We outperform classical recurrent sequence models including other recurrent transformers. We train on sequences of length T=100 and show generalization up to T=800 and T=1000, which we think is yet unheard of. 7/9
1
1
2
@chriswolfvision
Christian Wolf (🦋🦋🦋)
20 days
On a very high-level, the difference between classical rel pose and Kinaema is: - rel pose estimation compares the camera poses of 2 images - Kinaema estimates the rel pose between an image and agent memory, where memory holds the scene and the current agent position. 6/9
1
1
2
@chriswolfvision
Christian Wolf (🦋🦋🦋)
20 days
The model and memory are integrated into our SoTA ImageNav agent DEBiT (ICLR 2024, https://t.co/tCtfqHcmur). The augmented agent gets directional information from two sources: - comparing obs and goal with DEBiT - comparing memory with goal with the new Kinaema model 5/9
1
1
2
@chriswolfvision
Christian Wolf (🦋🦋🦋)
20 days
Our goals were (G1) Recurrence, be fast and robotics friendly - O(1) updates! (G2) Memory capacity - scale memory without effect on network capacity, in contrast to LSTMs, GRUs, Mamba etc. (G3) Stability - integrate gating into recurrent transformers. 4/9
1
1
3
@chriswolfvision
Christian Wolf (🦋🦋🦋)
20 days
Kinaema memory - is recurrent, ie. is updated in O(1) compared to O(N^2) of classical transformers or O(N) of causal transformers, - uses a transformer to update its recurrent memory, - does not have a limited context length, - does not need to store observations. 3/9
1
1
4
@chriswolfvision
Christian Wolf (🦋🦋🦋)
20 days
Kinaema is capable of integrating visual observations while moving in a potentially large scene and maintains latent memory of the scene. Upon request, it takes a query image and predicts the rel. position of the shown space with respect to its current position. 2/9
1
0
2
@chriswolfvision
Christian Wolf (🦋🦋🦋)
20 days
We have a new sequence model for robotics, which will be presented at #NeurIPS2025: Kinaema: A recurrent sequence model for memory and pose in motion https://t.co/4ytorM2JqW By @mbsariyildiz, @WeinzaepfelP, @_WGW101, G. Monaci and myself. @naverlabseurope 1/9
5
45
225
@chriswolfvision
Christian Wolf (🦋🦋🦋)
21 days
With retrieval, we further improve the SoTA on ImageNav, and allow for zero shot use cases. This is work by G. Monaci, R.S. Rezende, @RDeffayet, @kgcs96, @_WGW101, H. Déjean, @sclincha and myself.
0
1
1
@chriswolfvision
Christian Wolf (🦋🦋🦋)
21 days
This work heavily builds on the geometric foundation model of our SoTA DEBiT agent: ICLR 2024 https://t.co/warJFQ3g7b In RANa we leverage this Geometric FM for navigation, but also to process the retrieval context and extract directional information:
1
1
1
@chriswolfvision
Christian Wolf (🦋🦋🦋)
21 days
Our paper "RANa: Retrieval-Augmented Navigation" has been accepted at TMLR! We retrieve waypoint images from a DB and leverage the ImageNav capabilities from our SOTA agent DEBit to navigate to these waypoints, improving performance. https://t.co/BXLUh0VaVh @naverlabseurope
1
6
31
@chriswolfvision
Christian Wolf (🦋🦋🦋)
24 days
We target MSc interns, and if the internship is successful can propose a PhD position in Fall 2026.
0
0
0
@chriswolfvision
Christian Wolf (🦋🦋🦋)
24 days
We have a new internship position open in our team, on AI for robotics: manipulation using 3D foundation models. @naverlabseurope With Sorbonne University / ISIR (Nicolas Thome) You can apply online: https://t.co/poiATq0mEI
1
7
18
@chriswolfvision
Christian Wolf (🦋🦋🦋)
4 months
Naver Labs Europe organizes a Workshop on AI for Robotics in the French Alpes (Grenoble), the 4th edition. This year the topic is 'Spatial AI', registration is open!
@naverlabseurope
NAVER LABS Europe
4 months
Major announcement ✨registration is OPEN✨ AI for Robotics workshop (4th edition) 🗓️Nov 21-22 Grenoble, France! https://t.co/XivExHLzO4 Confirmed speakers: Andrew Davison - Nicolas Mansard - @CordeliaSchmid - Marc Pollefeys - Michael Gienger - David Novotny - Andrea Vedaldi -
1
2
20
@chriswolfvision
Christian Wolf (🦋🦋🦋)
4 months
References: This new paper and study: https://t.co/9HZNcoaxtX Binocular encoders for RPE and image-goal navigation: (ICLR 2024) https://t.co/tCtfqHcUjZ 5/5
0
0
2
@chriswolfvision
Christian Wolf (🦋🦋🦋)
4 months
We show that nav and RPE performance can be partially transferred from simple to realistic sim, but cannot be fully recovered. Binocular ViTs with cross-attention + pre-training for rel pose estimation (RPE) remain the most efficient solution for image goal navigation. 4/5
1
0
2
@chriswolfvision
Christian Wolf (🦋🦋🦋)
4 months
We show that nav training only works when the sim setting is easy (sliding across walls is ok). While the impact of this on sim2real transfer was known, its impact on image goal nav is surprising: the setting is related to physics, whereas RPE is a computer vision problem. 3/5
1
0
1