Christian Wolf (🦋🦋🦋)
@chriswolfvision
Followers
8K
Following
10K
Media
962
Statuses
6K
Principal Scientist, @NaverLabsEurope, Lead of Spatial AI team. AI for Robotics. Feedback: https://t.co/uD0Z0OSHEX
Lyon, France
Joined November 2015
A new model for human mesh recovery, high-performing and w/o using any 3D scans, has been published by my excellent colleagues at @naverlabseurope . Excellent work!
Meet Anny, our Free (Apache 2.0) and Interpretable Human Body Model for all ages. Anny is built upon #MakeHuman and enables achieving SOTA in Human Mesh Recovery. ArXiv: https://t.co/vUrpD9tyBJ Demo: https://t.co/WpigEQJhUD Code: https://t.co/zUL0AqBt1d
@naverlabseurope
4
62
423
"Sliding is all you need" (aka "What really matters in image goal navigation") has been accepted to 3DV 2026 (@3DVconf ) as an Oral presentation! By Gianluca Monaci, @WeinzaepfelP and myself. @naverlabseurope
In a new paper led by Gianluca Monaci, with @WeinzaepfelP and myself, we explore the relationship between rel pose estimation and image goal navigation and study diff. architectures: late fusion, channel cat, space2depth and cross-attention. https://t.co/9HZNcoaxtX 🧵1/5
1
13
58
Kinaema is trained with pose supervision and optionally with masked image modelling. Interestingly, even from training with relative pose only, we can show through probing experiments that occupancy maps are encoded in the maintained memory. 9/9
0
0
2
Kinaema memory is composed of a set of multiple embeddings, and the attention of the same scene patch to embeddings seems to follow a stable pattern. 8/9
1
1
3
We outperform classical recurrent sequence models including other recurrent transformers. We train on sequences of length T=100 and show generalization up to T=800 and T=1000, which we think is yet unheard of. 7/9
1
1
2
On a very high-level, the difference between classical rel pose and Kinaema is: - rel pose estimation compares the camera poses of 2 images - Kinaema estimates the rel pose between an image and agent memory, where memory holds the scene and the current agent position. 6/9
1
1
2
The model and memory are integrated into our SoTA ImageNav agent DEBiT (ICLR 2024, https://t.co/tCtfqHcmur). The augmented agent gets directional information from two sources: - comparing obs and goal with DEBiT - comparing memory with goal with the new Kinaema model 5/9
1
1
2
Our goals were (G1) Recurrence, be fast and robotics friendly - O(1) updates! (G2) Memory capacity - scale memory without effect on network capacity, in contrast to LSTMs, GRUs, Mamba etc. (G3) Stability - integrate gating into recurrent transformers. 4/9
1
1
3
Kinaema memory - is recurrent, ie. is updated in O(1) compared to O(N^2) of classical transformers or O(N) of causal transformers, - uses a transformer to update its recurrent memory, - does not have a limited context length, - does not need to store observations. 3/9
1
1
4
Kinaema is capable of integrating visual observations while moving in a potentially large scene and maintains latent memory of the scene. Upon request, it takes a query image and predicts the rel. position of the shown space with respect to its current position. 2/9
1
0
2
We have a new sequence model for robotics, which will be presented at #NeurIPS2025: Kinaema: A recurrent sequence model for memory and pose in motion https://t.co/4ytorM2JqW By @mbsariyildiz, @WeinzaepfelP, @_WGW101, G. Monaci and myself. @naverlabseurope 1/9
5
45
225
With retrieval, we further improve the SoTA on ImageNav, and allow for zero shot use cases. This is work by G. Monaci, R.S. Rezende, @RDeffayet, @kgcs96, @_WGW101, H. Déjean, @sclincha and myself.
0
1
1
This work heavily builds on the geometric foundation model of our SoTA DEBiT agent: ICLR 2024 https://t.co/warJFQ3g7b In RANa we leverage this Geometric FM for navigation, but also to process the retrieval context and extract directional information:
1
1
1
Our paper "RANa: Retrieval-Augmented Navigation" has been accepted at TMLR! We retrieve waypoint images from a DB and leverage the ImageNav capabilities from our SOTA agent DEBit to navigate to these waypoints, improving performance. https://t.co/BXLUh0VaVh
@naverlabseurope
1
6
31
We target MSc interns, and if the internship is successful can propose a PhD position in Fall 2026.
0
0
0
We have a new internship position open in our team, on AI for robotics: manipulation using 3D foundation models. @naverlabseurope With Sorbonne University / ISIR (Nicolas Thome) You can apply online: https://t.co/poiATq0mEI
1
7
18
Naver Labs Europe organizes a Workshop on AI for Robotics in the French Alpes (Grenoble), the 4th edition. This year the topic is 'Spatial AI', registration is open!
Major announcement ✨registration is OPEN✨ AI for Robotics workshop (4th edition) 🗓️Nov 21-22 Grenoble, France! https://t.co/XivExHLzO4 Confirmed speakers: Andrew Davison - Nicolas Mansard - @CordeliaSchmid - Marc Pollefeys - Michael Gienger - David Novotny - Andrea Vedaldi -
1
2
20
References: This new paper and study: https://t.co/9HZNcoaxtX Binocular encoders for RPE and image-goal navigation: (ICLR 2024) https://t.co/tCtfqHcUjZ 5/5
0
0
2
We show that nav and RPE performance can be partially transferred from simple to realistic sim, but cannot be fully recovered. Binocular ViTs with cross-attention + pre-training for rel pose estimation (RPE) remain the most efficient solution for image goal navigation. 4/5
1
0
2
We show that nav training only works when the sim setting is easy (sliding across walls is ok). While the impact of this on sim2real transfer was known, its impact on image goal nav is surprising: the setting is related to physics, whereas RPE is a computer vision problem. 3/5
1
0
1