Huan Ling
@HuanLing6
Followers
1K
Following
269
Media
26
Statuses
164
Senior Research Scientist & Hiring Manager at NVIDIA SIL (Spatial Intelligence Lab). We are recruiting, please reach out. Opinions are my own.
Joined April 2019
Gemini 3 is now available in anycoder on Hugging Face anycoder has a new UI built in support for -Gradio -Steamlit -transformers.js -ComfyUI -React apps more soon
4
5
40
checkout cosmos2.5 on huggingface!
0
0
12
We are hiring PhD interns starting from Jan/2026. Please reach out if you are interested in working with us at Nvidia SIL (Spatial Intelligence Lab, Old Toronto AI). Topics including world model, RL post training and multi-modality generative models. Please reach out!
21
91
566
Exactly — ChronoEdit shares a similar intuition with the recently popular VEO3-based paper ( https://t.co/OCFmuqgwvT). However, why do we need to denoise the full video for an image editing task? In ChronoEdit, we introduce a temporal reasoning stage, similar to a
video-zero-shot.github.io
Video models like Veo 3 are on a path to become vision foundation models.
Zero-shot video reasoning (chain-of-frames) isn’t just for Veo3 — open-source models can understand and edit too! 🕹️ ChronoEdit brings temporal reasoning to image editing. 🔗 https://t.co/6pyTDfzmGH
1
6
48
This work is a great collaboration at @NVIDIAAI by @jayzhangjiewu @xuanchi13 @TianchangS @TianshiC @Kai__He @YifanLu17525599 @ruiyuan_gao @xieenze_jr @voidrank Jose M. Alvarez @JunGao33210520 @FidlerSanja @zianwang97 @HuanLing6 Models and code will be released in the coming
0
0
9
3 - Overview of the ChronoEdit pipeline: From right to left, the denoising process begins in the temporal reasoning stage, where the model imagines and denoises a short trajectory of intermediate frames. These intermediate frames act as reasoning tokens, guiding how the edit
0
0
8
2 - ChronoEdit introduces temporal reasoning toekns. If the video reasoning tokens are fully denoised into a clean video, the model can illustrate how it “thinks” by visualizing intermediate frames as a reasoning trajectory—though at the expense of slower inference. Notably, an
0
1
7
1 - ChronoEdit repurposes pretrained video generative models for editing by reframing the task as a two-frame video generation problem, where the input image and its edited version are modeled as consecutive frames. When fine-tuned with curated image-editing data, this two-frame
0
0
8
🕹️We are excited to introduce "ChronoEdit: Towards Temporal Reasoning for Image Editing and World Simulation" ChronoEdit reframes image editing as a video generation task to encourage temporal consistency. It leverages a temporal reasoning stage that denoises with “video
6
37
142
Thank you Ak for featuring Lyra. Lyra Model is released on Huggingface. check it out at
huggingface.co
Nvidia just released Lyra on Hugging Face Generative 3D Scene Reconstruction via Video Diffusion Model Self-Distillation TL;DR: Feed-forward 3D and 4D scene generation from a single image/video trained with synthetic data generated by a camera-controlled video diffusion model
1
4
27
📢 Lyra: Generative 3D Scene Reconstruction via Video Diffusion Model Self-Distillation Got only one or a few images and wondering if recovering the 3D environment is a reconstruction or generation problem? Why not do it with a generative reconstruction model! We show that a
19
70
250
Amazing work from @huangjh_hjh and the team!! If you want to build genie3 like model, checkout ViPE - the STOA video pose engine which estimate camera parameters and dense metric depth from in the wild videos. Code is release. We will also release our annotated video dataset
[1/N] 🎥 We've made available a powerful spatial AI tool named ViPE: Video Pose Engine, to recover camera motion, intrinsics, and dense metric depth from casual videos! Running at 3–5 FPS, ViPE handles cinematic shots, dashcams, and even 360° panoramas. 🔗 https://t.co/1mGDxwgYJt
0
0
16
🚀 Difix3D+ is now open-sourced! Check out the code and try the demo: https://t.co/O7p7eUzxSz We're presenting at #CVPR2025 this Sunday, June 15 — come say hi! 🗣️ Oral: 1:00–1:15 PM CDT, Karl Dean Grand Ballroom 🖼️ Poster: 4:00–6:00 PM CDT, ExHall D (Poster #57)
github.com
[CVPR 2025 Oral & Award Candidate] Difix3D+: Improving 3D Reconstructions with Single-Step Diffusion Models - nv-tlabs/Difix3D
Excited to share our #CVPR2025 paper: Difix3D+ Difix3D+ reimagines 3D reconstruction with single-step diffusion, distilling 2D generative priors for realistic novel view synthesis from large viewpoint shifts. 📄Paper: https://t.co/2qk0LP16Di 🌐Website: https://t.co/5O5XZWoJ5E
0
16
72
🚀 Introducing Cosmos-Predict2! Our most powerful open video foundation model for Physical AI. Cosmos-Predict2 significantly improves upon Predict1 in visual quality, prompt alignment, and motion dynamics—outperforming popular open-source video foundation models. It’s openly
7
62
204
We show Cosmos-Drive-Dreams benefits in improving several AV tasks: 3D detection, mapping and trajectory prediction. See paper writeup for details!
0
0
3
[Toolkit for developers] Furthermore, we open source a toolkit with rendering scripts, distributed SDG utilities and WebUI based 3D Trajectory Editing Tool.
0
0
5
[Open Source Massive dataset] Featuring 81,802 synthetic clips paired with structured HD map & LiDAR labels, based on 5,843 collected clips. And yes—they’ve also open-sourced. At
huggingface.co
0
0
7
[Built on World Foundation Model] The core of our SDG pipeline is Cosmos-Drive, a suite of models specialized from NVIDIA Cosmos world foundation model for the driving domain and are capable of controllable, high-fidelity, multi-view, and spatiotemporally consistent driving
0
0
4
We are excited to share Cosmos-Drive-Dreams 🚀 A bold new synthetic data generation (SDG) pipeline powered by world foundation models—designed to synthesize rich, challenging driving scenarios at scale. Models, Code, Dataset, Tookit are released. Website:
11
44
107