Xingang Pan
@XingangP
Followers
3K
Following
324
Media
22
Statuses
76
Assistant Professor at Nanyang Technological University @NTUsg @MMLabNTU - Computer Vision, Deep Learning, Computer Graphics
Singapore
Joined May 2018
Introducing ๐ฆ๐๐ฟ๐๐ถ๐๐ฎ๐๐ฒ๐ป๐๐ง (SIGGRAPH Asia 2025) โ a high-quality 3D diffusion model that explicitly models object articulation, paving the way for richer, more realistic assets in embodied AI and simulation: โ Generates fully articulated 3D objects โ Physically
2
37
167
STream3R Scalable Sequential 3D Reconstruction with Causal Transformer
4
14
110
Cool work that connects the idea of volume rendering with image diffusion!
Our paper LaRender received full marks at ICCV 2025 and was selected as oral! This paper enables control of occlusion relationships among objects and visual effects in a training-free manner for diffusion-based image generation. Project page: https://t.co/XzjMZuJ4a4
0
1
8
Introducing ๐ฆ๐ง๐ฟ๐ฒ๐ฎ๐บ๐ฏ๐ฅ, a new 3D geometric foundation model for efficient 3D reconstruction from streaming input. Similar to LLMs, STream3R uses casual attention during training and KVCache at inference. No need to worry about post-alignment or reconstructing from scratch.
๐ฅStreaming-based 3D/4D Foundation Model๐ฅ We present STream3R, which reformulates dense 3D/4D reconstruction into a sequential registration task with **causal attention**. - Projects: https://t.co/zrLlvxJ0FJ - Code: https://t.co/ONYaJDrjhF - Model:
5
58
320
Grok 4 one shots building a gemma-3-270m chatbot with transformers.js one click deploy in anycoder
9
13
106
Directly training Video Diffusion Models on long videos faces huge memory and learning challenges. How do we model long-range temporal distribution then? Our ICCV 2025 work, ๐๏ธ๐ง๐ผ๐ธ๐ฒ๐ป๐๐๐ฒ๐ป, offers a solution. We compress videos into a highly condensed token space, enabling
1
25
106
๐ช๐ผ๐ฟ๐น๐ฑ๐ ๐ฒ๐บ is mainly created by @zeqi_xiao Project page: https://t.co/vi78xdY2TT ArXiv: https://t.co/Cu4YwGy7YP Github: https://t.co/3PEcUJDYCw Demo:
1
0
4
Synthesizing worlds with video diffusion models is often inconsistent โ moving the camera back and forth leads to different scenes. We propose ๐๐ช๐ผ๐ฟ๐น๐ฑ๐ ๐ฒ๐บ, a memory-based approach that ensures consistent world simulation without relying on explicit 3D reconstruction.
While recent works like Genie 2, The Matrix, and Navigation World Models explore video generative models as world simulators, world consistency remains underexplored. In this work, we propose ๐WorldMem๐, introducing a memory mechanism for long-term consistent world simulation.
2
27
148
Diffusion models are sensitive to small changes in the input noise. We introduce Alias-Free Latent Diffusion Models (๐๐-๐๐๐ ) at #CVPR2025. It achieves shift-equivariance and generates consistent outputs. Project: https://t.co/nehjzSFAVU arXiv: https://t.co/CksgC8A0Ph
8
63
409
The Bokeh Effect is so important in photography, yet existing text2image diffusion models do not support controling bokeh strength. We introduce ๐๐ผ๐ธ๐ฒ๐ต ๐๐ถ๐ณ๐ณ๐๐๐ถ๐ผ๐ป, a T2I diffusion model that supports flexible background blur control! Project: https://t.co/YlnSETImsz
1
10
44
๐ฅ Consistent Multi-View Diffusion for 3D Enhancement ๐ฅ Introducing our work #3DEnhancer @CVPR: a multi-view diffusion model that enhances multi-view images to improve 3D models. ๐ฐarXiv: https://t.co/eNvgSTsKWN ๐ฅProject: https://t.co/VDPG5NvRSt
1
10
24
๐Excited to share Neural LightRig!๐ It allows for accurate and fast estimation of surface normals and PBR materials from just one image. We achieve this by generating multi-light images with a diffusion model, overcoming the estimation ambiguity of inverse rendering.๐ Page:
1
21
66
Introducing ๐กTrajectory Attention for Fine-grained Video Motion Control๐ก. By augmenting attention along predefined trajectories, our approach empowers tasks such as camera motion control in images and videos, as well as video editing.
1
11
62
Introducing ๐๐๐๐๐, which tokenizes 3D objects into multiscale tokens and generates 3D objects by autoregressive next-scale prediction. ๐๐๐๐๐ enables fast 3D generation and comprehensive 3D understanding. arXiv: https://t.co/xIKWx8o8I4 Project: https://t.co/8hUptJOubR
2
53
240
Introducing ๐๐๐ฎ๐ฌ๐ฌ๐ข๐๐ง ๐๐ง๐ฒ๐ญ๐ก๐ข๐ง๐ , a new 3D generative model with two key properties: - A structured point-cloud latent space enabling flexible editing! - Support multi-modal conditions, e.g., point cloud, text, single/multi-view images arXiv: https://t.co/fahQOFeDAa
9
50
300