Nikhil Keetha
@Nik__V__
Followers
2K
Following
24K
Media
92
Statuses
799
PhD in Robotics @CMU_Robotics @airlabcmu | Visiting Researcher @Meta | Making robots π€ see the π
Joined November 2017
Meet MapAnything β a transformer that directly regresses factored metric 3D scene geometry (from images, calibration, poses, or depth) in an end-to-end way. No pipelines, no extra stages. Just 3D geometry & cameras, straight from any type of input, delivering new state-of-the-art
29
128
723
Simulation drives robotics progress, but how do we close the reality gap? Introducing GaussGym: an open-source framework for learning locomotion from pixels with ultra-fast parallelized photorealistic rendering across >4,000 iPhone, GrandTour, ARKit, and Veo scenes! Thread π§΅
10
60
290
Tune in @ ICCV on Mon @ 10.30am where I talk about everything 3D + Realism: - Hyperscape: Gaussian Splatting in VR - FlowR: Flowing from Sparse-2-Dense 3D Recon - BulletGen: Improving 4D Recon with Bullet-Time Gen - MapAnything: Universal Feed-Forward Metric 3D Recon π§΅π
2
16
176
Excited to share our latest work from the ByteDance Seed Depth Anything team β Trace Anything: Representing Any Video in 4D via Trajectory Fields π» Project Page: https://t.co/Q390WcWwG4 π Paper: https://t.co/NfxT260QWy π¦ Code: https://t.co/r2VbOHyRwL π€ Model:
4
7
63
Check out the blog from @CarnegieMellon @CMU_Robotics @roboVisionCMU covering MapAnything! Stay tuned on our GitHub repo for some exciting updates soon π€©
Researchers from the Robotics Institute + Meta Reality Labs have built a model that reconstructs images, camera data or depth scans into 3D maps within a unified system! MapAnything captures both small details and large spaces with high precision ππΊοΈπ https://t.co/0dSLjzfeYc
0
4
35
Researchers from the Robotics Institute + Meta Reality Labs have built a model that reconstructs images, camera data or depth scans into 3D maps within a unified system! MapAnything captures both small details and large spaces with high precision ππΊοΈπ https://t.co/0dSLjzfeYc
1
13
102
FlowR: Flowing from Sparse to Dense 3D Reconstructions @TobiasRobotics et 8 al with @Nik__V__ tl;dr: Mast3r -> 3D rec -> render new view -> flow matching & refine reconstruction with synth views -> better rec https://t.co/fmTkC15rSl
3
15
95
three years ago, DiT replaced the legacy unet with a transformer-based denoising backbone. we knew the bulky VAEs would be the next to go -- we just waited until we could do it right. today, we introduce Representation Autoencoders (RAE). >> Retire VAEs. Use RAEs. π(1/n)
56
330
2K
@jianyuan_wang I guess you tried the DAv3 style init startegy for VGGT back in December'24 with the main difference being that it was full alternating (ablation c) instead of the partial one (last layers only) DAv3 uses?
2
1
5
Interesting ICLR submissions π€© Depth Anything 3 - My TLDR: Init multi view transformer of VGGT with later layer DINO weights and use teacher model trained on synthetic data only for pseudo labelling real world datasets https://t.co/1gq6WSZM5c Trace Anything - My TLDR: VGGT
5
41
393
Looks like researchers in visual localization are good reviewers: I see so many familiar names in the list of ICCV outstanding reviewers! Congrats to @alessiodelbue @masone_carlo @kgcs96 @jcivera @Parskatt Julian Kooij @maththrills @NicStrisc @Nik__V__ (and me, happy to be in
4
6
33
Glad to be recognized again as an outstanding reviewer! π€
Thereβs no conference without the efforts of our reviewers. Special shoutout to our #ICCV2025 outstanding reviewers π«‘ https://t.co/WYAcXLRXla
0
0
24
We introduce RAVEN, a 3D open-set memory-based behavior tree framework for aerial outdoor semantic navigation. RAVEN not only navigates reliably toward detected targets, but also performs long-range semantic reasoning and LVLM-guided informed search
1
7
21
π’ SceneComp @ ICCV 2025 ποΈ π Generative Scene Completion for Immersive Worlds π οΈ Reconstruct what you know AND πͺ Generate what you donβt! π Meet our speakers @angelaqdai, @holynski_, @jampani_varun, @ZGojcic @taiyasaki, Peter Kontschieder https://t.co/LvONYIK3dz
#ICCV2025
2
17
52
Show-off your own Hyperscapes! Thead π§΅ below π showing off various user-created Hyperscapes! They all look SOOOO good! Add your video in the comments or @ me, and I will add it to the thread!
Introducing: Hyperscape Capture π· Last year we showed the world's highest quality Gaussian Splatting, and the first time GS was viewable in VR. Now, capture your own Hyperscapes, directly from your Quest headset in only 5 minutes of walking around. https://t.co/wlHmtRiANy
7
20
192
πCameraBench has been accepted as a Spotlight (3%) @ NeurIPS 2025. Huge congrats to all collaborators at CMU, MIT-IBM, UMass, Harvard, and Adobe. CameraBench is a large-scale effort that pushes video-language models to reason about the language of camera motion just like
arxiv.org
We introduce CameraBench, a large-scale dataset and benchmark designed to assess and improve camera motion understanding. CameraBench consists of ~3,000 diverse internet videos, annotated by...
π· Can AI understand camera motion like a cinematographer? Meet CameraBench: a large-scale, expert-annotated dataset for understanding camera motion geometry (e.g., trajectories) and semantics (e.g., scene contexts) in any video β films, games, drone shots, vlogs, etc. Links
7
23
162
Introducing: Hyperscape Capture π· Last year we showed the world's highest quality Gaussian Splatting, and the first time GS was viewable in VR. Now, capture your own Hyperscapes, directly from your Quest headset in only 5 minutes of walking around. https://t.co/wlHmtRiANy
Hyperscape: The future of VR and the Metaverse Excited that Zuckerberg @finkd announced what I have been working on at Connect. Hyperscape enables people to create high fidelity replicas of physical spaces, and embody them in VR. Check out the demo app: https://t.co/TcRRUfymoc
41
284
2K
Awesome thread; always fun to see @ducha_aiki stress testing things!
MapAnything vs VGGT thread. Every tweet -- 4 results on the same input: VGGT, VGGT-Apache, MapAnything, MapAnything-Apache Starting with classics, graffiti from OxfordAffine. Top-down to see the wall flattness. Spoiler: All not flat 1/
1
2
5
Forget pointmaps, pinhole, image-only, up-to-scaleβ¦ and meet MapAnything! Truly amazing, long-term effort led by @Nik__V__ that weβre super excited to share today!
Meet MapAnything β a transformer that directly regresses factored metric 3D scene geometry (from images, calibration, poses, or depth) in an end-to-end way. No pipelines, no extra stages. Just 3D geometry & cameras, straight from any type of input, delivering new state-of-the-art
0
5
82