Huan Ling @HuanLing6 X Profile

Huan Ling

@HuanLing6

Followers

1K

Following

269

Media

26

Statuses

164

Senior Research Scientist & Hiring Manager at NVIDIA SIL (Spatial Intelligence Lab). We are recruiting, please reach out. Opinions are my own.

https://t.co/IcreXEAqdg

Joined April 2019

Don't wanna be here? Send us removal request.

AK

@_akhaliq

4 days

Gemini 3 is now available in anycoder on Hugging Face anycoder has a new UI built in support for -Gradio -Steamlit -transformers.js -ComfyUI -React apps more soon

4

5

40

Huan Ling

@HuanLing6

18 days

checkout cosmos2.5 on huggingface!

AK

@_akhaliq

18 days

World Simulation with Video Foundation Models for Physical AI

0

12

Huan Ling

@HuanLing6

1 month

It's a fully remote opportunity.

1

0

14

Huan Ling

@HuanLing6

1 month

We are hiring PhD interns starting from Jan/2026. Please reach out if you are interested in working with us at Nvidia SIL (Spatial Intelligence Lab, Old Toronto AI). Topics including world model, RL post training and multi-modality generative models. Please reach out!

21

91

566

Huan Ling

@HuanLing6

2 months

Exactly — ChronoEdit shares a similar intuition with the recently popular VEO3-based paper ( https://t.co/OCFmuqgwvT). However, why do we need to denoise the full video for an image editing task? In ChronoEdit, we introduce a temporal reasoning stage, similar to a

video-zero-shot.github.io

Video models like Veo 3 are on a path to become vision foundation models.

Xuanchi Ren

@xuanchi13

2 months

Zero-shot video reasoning (chain-of-frames) isn’t just for Veo3 — open-source models can understand and edit too! 🕹️ ChronoEdit brings temporal reasoning to image editing. 🔗 https://t.co/6pyTDfzmGH

1

6

48

Huan Ling

@HuanLing6

2 months

This work is a great collaboration at @NVIDIAAI by @jayzhangjiewu @xuanchi13 @TianchangS @TianshiC @Kai__He @YifanLu17525599 @ruiyuan_gao @xieenze_jr @voidrank Jose M. Alvarez @JunGao33210520 @FidlerSanja @zianwang97 @HuanLing6 Models and code will be released in the coming

0

9

Huan Ling

@HuanLing6

2 months

3 - Overview of the ChronoEdit pipeline: From right to left, the denoising process begins in the temporal reasoning stage, where the model imagines and denoises a short trajectory of intermediate frames. These intermediate frames act as reasoning tokens, guiding how the edit

0

8

Huan Ling

@HuanLing6

2 months

2 - ChronoEdit introduces temporal reasoning toekns. If the video reasoning tokens are fully denoised into a clean video, the model can illustrate how it “thinks” by visualizing intermediate frames as a reasoning trajectory—though at the expense of slower inference. Notably, an

0

1

7

Huan Ling

@HuanLing6

2 months

1 - ChronoEdit repurposes pretrained video generative models for editing by reframing the task as a two-frame video generation problem, where the input image and its edited version are modeled as consecutive frames. When fine-tuned with curated image-editing data, this two-frame

0

8

Huan Ling

@HuanLing6

2 months

🕹️We are excited to introduce "ChronoEdit: Towards Temporal Reasoning for Image Editing and World Simulation" ChronoEdit reframes image editing as a video generation task to encourage temporal consistency. It leverages a temporal reasoning stage that denoises with “video

6

37

142

Huan Ling

@HuanLing6

2 months

Thank you Ak for featuring Lyra. Lyra Model is released on Huggingface. check it out at

huggingface.co

AK

@_akhaliq

2 months

Nvidia just released Lyra on Hugging Face Generative 3D Scene Reconstruction via Video Diffusion Model Self-Distillation TL;DR: Feed-forward 3D and 4D scene generation from a single image/video trained with synthetic data generated by a camera-controlled video diffusion model

1

4

27

Sherwin Bahmani

@sherwinbahmani

2 months

📢 Lyra: Generative 3D Scene Reconstruction via Video Diffusion Model Self-Distillation Got only one or a few images and wondering if recovering the 3D environment is a reconstruction or generation problem? Why not do it with a generative reconstruction model! We show that a

19

70

250

Huan Ling

@HuanLing6

3 months

Amazing work from @huangjh_hjh and the team!! If you want to build genie3 like model, checkout ViPE - the STOA video pose engine which estimate camera parameters and dense metric depth from in the wild videos. Code is release. We will also release our annotated video dataset

Jiahui Huang

@huangjh_hjh

3 months

[1/N] 🎥 We've made available a powerful spatial AI tool named ViPE: Video Pose Engine, to recover camera motion, intrinsics, and dense metric depth from casual videos! Running at 3–5 FPS, ViPE handles cinematic shots, dashcams, and even 360° panoramas. 🔗 https://t.co/1mGDxwgYJt

0

16

Jay Wu

@jayzhangjiewu

5 months

🚀 Difix3D+ is now open-sourced! Check out the code and try the demo: https://t.co/O7p7eUzxSz We're presenting at #CVPR2025 this Sunday, June 15 — come say hi! 🗣️ Oral: 1:00–1:15 PM CDT, Karl Dean Grand Ballroom 🖼️ Poster: 4:00–6:00 PM CDT, ExHall D (Poster #57)

github.com

[CVPR 2025 Oral & Award Candidate] Difix3D+: Improving 3D Reconstructions with Single-Step Diffusion Models - nv-tlabs/Difix3D

Jay Wu

@jayzhangjiewu

9 months

Excited to share our #CVPR2025 paper: Difix3D+ Difix3D+ reimagines 3D reconstruction with single-step diffusion, distilling 2D generative priors for realistic novel view synthesis from large viewpoint shifts. 📄Paper: https://t.co/2qk0LP16Di 🌐Website: https://t.co/5O5XZWoJ5E

0

16

72

Qinsheng Zhang

@qsh_zh

5 months

🚀 Introducing Cosmos-Predict2! Our most powerful open video foundation model for Physical AI. Cosmos-Predict2 significantly improves upon Predict1 in visual quality, prompt alignment, and motion dynamics—outperforming popular open-source video foundation models. It’s openly

7

62

204

Huan Ling

@HuanLing6

5 months

We show Cosmos-Drive-Dreams benefits in improving several AV tasks: 3D detection, mapping and trajectory prediction. See paper writeup for details!

0

3

Huan Ling

@HuanLing6

5 months

[Toolkit for developers] Furthermore, we open source a toolkit with rendering scripts, distributed SDG utilities and WebUI based 3D Trajectory Editing Tool.

0

5

Huan Ling

@HuanLing6

5 months

[Open Source Massive dataset] Featuring 81,802 synthetic clips paired with structured HD map & LiDAR labels, based on 5,843 collected clips. And yes—they’ve also open-sourced. At

huggingface.co

0

7

Huan Ling

@HuanLing6

5 months

[Built on World Foundation Model] The core of our SDG pipeline is Cosmos-Drive, a suite of models specialized from NVIDIA Cosmos world foundation model for the driving domain and are capable of controllable, high-fidelity, multi-view, and spatiotemporally consistent driving

0

4

Huan Ling

@HuanLing6

5 months

We are excited to share Cosmos-Drive-Dreams 🚀 A bold new synthetic data generation (SDG) pipeline powered by world foundation models—designed to synthesize rich, challenging driving scenarios at scale. Models, Code, Dataset, Tookit are released. Website:

11

44

107