Huan Ling Profile
Huan Ling

@HuanLing6

Followers
1K
Following
269
Media
26
Statuses
164

Senior Research Scientist & Hiring Manager at NVIDIA SIL (Spatial Intelligence Lab). We are recruiting, please reach out. Opinions are my own.

Joined April 2019
Don't wanna be here? Send us removal request.
@_akhaliq
AK
4 days
Gemini 3 is now available in anycoder on Hugging Face anycoder has a new UI built in support for -Gradio -Steamlit -transformers.js -ComfyUI -React apps more soon
4
5
40
@HuanLing6
Huan Ling
18 days
checkout cosmos2.5 on huggingface!
@_akhaliq
AK
18 days
World Simulation with Video Foundation Models for Physical AI
0
0
12
@HuanLing6
Huan Ling
1 month
It's a fully remote opportunity.
1
0
14
@HuanLing6
Huan Ling
1 month
We are hiring PhD interns starting from Jan/2026. Please reach out if you are interested in working with us at Nvidia SIL (Spatial Intelligence Lab, Old Toronto AI). Topics including world model, RL post training and multi-modality generative models. Please reach out!
21
91
566
@HuanLing6
Huan Ling
2 months
Exactly — ChronoEdit shares a similar intuition with the recently popular VEO3-based paper ( https://t.co/OCFmuqgwvT). However, why do we need to denoise the full video for an image editing task? In ChronoEdit, we introduce a temporal reasoning stage, similar to a
Tweet card summary image
video-zero-shot.github.io
Video models like Veo 3 are on a path to become vision foundation models.
@xuanchi13
Xuanchi Ren
2 months
Zero-shot video reasoning (chain-of-frames) isn’t just for Veo3 — open-source models can understand and edit too! 🕹️ ChronoEdit brings temporal reasoning to image editing. 🔗 https://t.co/6pyTDfzmGH
1
6
48
@HuanLing6
Huan Ling
2 months
This work is a great collaboration at @NVIDIAAI by @jayzhangjiewu @xuanchi13 @TianchangS @TianshiC @Kai__He @YifanLu17525599 @ruiyuan_gao @xieenze_jr @voidrank Jose M. Alvarez @JunGao33210520 @FidlerSanja @zianwang97 @HuanLing6 Models and code will be released in the coming
0
0
9
@HuanLing6
Huan Ling
2 months
3 - Overview of the ChronoEdit pipeline: From right to left, the denoising process begins in the temporal reasoning stage, where the model imagines and denoises a short trajectory of intermediate frames. These intermediate frames act as reasoning tokens, guiding how the edit
0
0
8
@HuanLing6
Huan Ling
2 months
2 - ChronoEdit introduces temporal reasoning toekns. If the video reasoning tokens are fully denoised into a clean video, the model can illustrate how it “thinks” by visualizing intermediate frames as a reasoning trajectory—though at the expense of slower inference. Notably, an
0
1
7
@HuanLing6
Huan Ling
2 months
1 - ChronoEdit repurposes pretrained video generative models for editing by reframing the task as a two-frame video generation problem, where the input image and its edited version are modeled as consecutive frames. When fine-tuned with curated image-editing data, this two-frame
0
0
8
@HuanLing6
Huan Ling
2 months
🕹️We are excited to introduce "ChronoEdit: Towards Temporal Reasoning for Image Editing and World Simulation" ChronoEdit reframes image editing as a video generation task to encourage temporal consistency. It leverages a temporal reasoning stage that denoises with “video
6
37
142
@HuanLing6
Huan Ling
2 months
Thank you Ak for featuring Lyra. Lyra Model is released on Huggingface. check it out at
Tweet card summary image
huggingface.co
@_akhaliq
AK
2 months
Nvidia just released Lyra on Hugging Face Generative 3D Scene Reconstruction via Video Diffusion Model Self-Distillation TL;DR: Feed-forward 3D and 4D scene generation from a single image/video trained with synthetic data generated by a camera-controlled video diffusion model
1
4
27
@sherwinbahmani
Sherwin Bahmani
2 months
📢 Lyra: Generative 3D Scene Reconstruction via Video Diffusion Model Self-Distillation Got only one or a few images and wondering if recovering the 3D environment is a reconstruction or generation problem? Why not do it with a generative reconstruction model! We show that a
19
70
250
@HuanLing6
Huan Ling
3 months
Amazing work from @huangjh_hjh and the team!! If you want to build genie3 like model, checkout ViPE - the STOA video pose engine which estimate camera parameters and dense metric depth from in the wild videos. Code is release. We will also release our annotated video dataset
@huangjh_hjh
Jiahui Huang
3 months
[1/N] 🎥 We've made available a powerful spatial AI tool named ViPE: Video Pose Engine, to recover camera motion, intrinsics, and dense metric depth from casual videos! Running at 3–5 FPS, ViPE handles cinematic shots, dashcams, and even 360° panoramas. 🔗 https://t.co/1mGDxwgYJt
0
0
16
@jayzhangjiewu
Jay Wu
5 months
🚀 Difix3D+ is now open-sourced! Check out the code and try the demo: https://t.co/O7p7eUzxSz We're presenting at #CVPR2025 this Sunday, June 15 — come say hi! 🗣️ Oral: 1:00–1:15 PM CDT, Karl Dean Grand Ballroom 🖼️ Poster: 4:00–6:00 PM CDT, ExHall D (Poster #57)
Tweet card summary image
github.com
[CVPR 2025 Oral & Award Candidate] Difix3D+: Improving 3D Reconstructions with Single-Step Diffusion Models - nv-tlabs/Difix3D
@jayzhangjiewu
Jay Wu
9 months
Excited to share our #CVPR2025 paper: Difix3D+ Difix3D+ reimagines 3D reconstruction with single-step diffusion, distilling 2D generative priors for realistic novel view synthesis from large viewpoint shifts. 📄Paper: https://t.co/2qk0LP16Di 🌐Website: https://t.co/5O5XZWoJ5E
0
16
72
@qsh_zh
Qinsheng Zhang
5 months
🚀 Introducing Cosmos-Predict2! Our most powerful open video foundation model for Physical AI. Cosmos-Predict2 significantly improves upon Predict1 in visual quality, prompt alignment, and motion dynamics—outperforming popular open-source video foundation models. It’s openly
7
62
204
@HuanLing6
Huan Ling
5 months
We show Cosmos-Drive-Dreams benefits in improving several AV tasks: 3D detection, mapping and trajectory prediction.  See paper writeup for details!
0
0
3
@HuanLing6
Huan Ling
5 months
[Toolkit for developers] Furthermore, we open source a toolkit with rendering scripts, distributed SDG utilities and WebUI based 3D Trajectory Editing Tool.
0
0
5
@HuanLing6
Huan Ling
5 months
[Open Source Massive dataset] Featuring 81,802 synthetic clips paired with structured HD map & LiDAR labels, based on 5,843 collected clips. And yes—they’ve also open-sourced. At
Tweet card summary image
huggingface.co
0
0
7
@HuanLing6
Huan Ling
5 months
[Built on World Foundation Model] The core of our SDG pipeline is  Cosmos-Drive, a suite of models specialized from NVIDIA Cosmos world foundation model for the driving domain and are capable of controllable, high-fidelity, multi-view, and spatiotemporally consistent driving
0
0
4
@HuanLing6
Huan Ling
5 months
We are excited to share Cosmos-Drive-Dreams 🚀 A bold new synthetic data generation (SDG) pipeline powered by world foundation models—designed to synthesize rich, challenging driving scenarios at scale. Models, Code, Dataset, Tookit are released. Website:
11
44
107