Xingang Pan @XingangP X Profile

Xingang Pan

@XingangP

Followers

3K

Following

324

Media

22

Statuses

76

Assistant Professor at Nanyang Technological University @NTUsg @MMLabNTU - Computer Vision, Deep Learning, Computer Graphics

https://t.co/n33D9D4Rgq

Singapore

Joined May 2018

Don't wanna be here? Send us removal request.

Xingang Pan

@XingangP

3 days

Introducing 📦𝗔𝗿𝘁𝗶𝗟𝗮𝘁𝗲𝗻𝘁🔧 (SIGGRAPH Asia 2025) — a high-quality 3D diffusion model that explicitly models object articulation, paving the way for richer, more realistic assets in embodied AI and simulation: – Generates fully articulated 3D objects – Physically

2

37

167

AK

@_akhaliq

3 months

STream3R Scalable Sequential 3D Reconstruction with Causal Transformer

4

14

110

Xingang Pan

@XingangP

3 months

Cool work that connects the idea of volume rendering with image diffusion!

Xiaohang Zhan

@xiaohangzhan

3 months

Our paper LaRender received full marks at ICCV 2025 and was selected as oral! This paper enables control of occlusion relationships among objects and visual effects in a training-free manner for diffusion-based image generation. Project page: https://t.co/XzjMZuJ4a4

0

1

8

Xingang Pan

@XingangP

3 months

Introducing 𝗦𝗧𝗿𝗲𝗮𝗺𝟯𝗥, a new 3D geometric foundation model for efficient 3D reconstruction from streaming input. Similar to LLMs, STream3R uses casual attention during training and KVCache at inference. No need to worry about post-alignment or reconstructing from scratch.

Yushi LAN

@GROS17121524

3 months

🔥Streaming-based 3D/4D Foundation Model🔥 We present STream3R, which reformulates dense 3D/4D reconstruction into a sequential registration task with **causal attention**. - Projects: https://t.co/zrLlvxJ0FJ - Code: https://t.co/ONYaJDrjhF - Model:

5

58

320

AK

@_akhaliq

3 months

Grok 4 one shots building a gemma-3-270m chatbot with transformers.js one click deploy in anycoder

9

13

106

Xingang Pan

@XingangP

4 months

Directly training Video Diffusion Models on long videos faces huge memory and learning challenges. How do we model long-range temporal distribution then? Our ICCV 2025 work, 🎞️𝗧𝗼𝗸𝗲𝗻𝘀𝗚𝗲𝗻, offers a solution. We compress videos into a highly condensed token space, enabling

1

25

106

Xingang Pan

@XingangP

7 months

𝗪𝗼𝗿𝗹𝗱𝗠𝗲𝗺 is mainly created by @zeqi_xiao Project page: https://t.co/vi78xdY2TT ArXiv: https://t.co/Cu4YwGy7YP Github: https://t.co/3PEcUJDYCw Demo:

1

0

4

Xingang Pan

@XingangP

7 months

Synthesizing worlds with video diffusion models is often inconsistent — moving the camera back and forth leads to different scenes. We propose 🌐𝗪𝗼𝗿𝗹𝗱𝗠𝗲𝗺, a memory-based approach that ensures consistent world simulation without relying on explicit 3D reconstruction.

Zeqi Xiao

@zeqi_xiao

7 months

While recent works like Genie 2, The Matrix, and Navigation World Models explore video generative models as world simulators, world consistency remains underexplored. In this work, we propose 🌐WorldMem🌐, introducing a memory mechanism for long-term consistent world simulation.

2

27

148

Xingang Pan

@XingangP

8 months

Diffusion models are sensitive to small changes in the input noise. We introduce Alias-Free Latent Diffusion Models (𝗔𝗙-𝗟𝗗𝗠) at #CVPR2025. It achieves shift-equivariance and generates consistent outputs. Project: https://t.co/nehjzSFAVU arXiv: https://t.co/CksgC8A0Ph

8

63

409

Xingang Pan

@XingangP

8 months

arXiv:

arxiv.org

Recent advances in large-scale text-to-image models have revolutionized creative fields by generating visually captivating outputs from textual prompts; however, while traditional photography...

0

1

3

Xingang Pan

@XingangP

8 months

The Bokeh Effect is so important in photography, yet existing text2image diffusion models do not support controling bokeh strength. We introduce 𝗕𝗼𝗸𝗲𝗵 𝗗𝗶𝗳𝗳𝘂𝘀𝗶𝗼𝗻, a T2I diffusion model that supports flexible background blur control! Project: https://t.co/YlnSETImsz

1

10

44

Yihang Luo

@TheYihangLuo

9 months

💥 Consistent Multi-View Diffusion for 3D Enhancement 💥 Introducing our work #3DEnhancer @CVPR: a multi-view diffusion model that enhances multi-view images to improve 3D models. 📰arXiv: https://t.co/eNvgSTsKWN 🔥Project: https://t.co/VDPG5NvRSt

1

10

24

Zexin He

@he_zexin

11 months

🎉Excited to share Neural LightRig!🎉 It allows for accurate and fast estimation of surface normals and PBR materials from just one image. We achieve this by generating multi-light images with a diffusion model, overcoming the estimation ambiguity of inverse rendering.🚀 Page:

1

21

66

Zeqi Xiao

@zeqi_xiao

1 year

Introducing 💡Trajectory Attention for Fine-grained Video Motion Control💡. By augmenting attention along predefined trajectories, our approach empowers tasks such as camera motion control in images and videos, as well as video editing.

1

11

62

Xingang Pan

@XingangP

1 year

Introducing 𝐒𝐀𝐑𝟑𝐃, which tokenizes 3D objects into multiscale tokens and generates 3D objects by autoregressive next-scale prediction. 𝐒𝐀𝐑𝟑𝐃 enables fast 3D generation and comprehensive 3D understanding. arXiv: https://t.co/xIKWx8o8I4 Project: https://t.co/8hUptJOubR

2

53

240

Xingang Pan

@XingangP

1 year

Project page:

0

9

Xingang Pan

@XingangP

1 year

Introducing 𝐆𝐚𝐮𝐬𝐬𝐢𝐚𝐧 𝐀𝐧𝐲𝐭𝐡𝐢𝐧𝐠, a new 3D generative model with two key properties: - A structured point-cloud latent space enabling flexible editing! - Support multi-modal conditions, e.g., point cloud, text, single/multi-view images arXiv: https://t.co/fahQOFeDAa

9

50

300