Hao Zhao @HaoZhao_AIRSUN X Profile

Hao Zhao

@HaoZhao_AIRSUN

Followers

608

Following

959

Media

11

Statuses

58

https://t.co/l52lTpdO16 Computer vision is good, have fun.

https://t.co/l52lTpdO16

Tsinghua University

Joined July 2024

Don't wanna be here? Send us removal request.

Hao Zhao

@HaoZhao_AIRSUN

5 days

🚀 GaussianArt (3DV 2026) is here! A single-stage unified geometry–motion model that finally scales articulated reconstruction to 20+ parts with order-of-magnitude higher accuracy. Evaluated on MPArt-90, the largest articulated benchmark to date. Code + project page below 👇 🔗

1

23

158

Hao Zhao

@HaoZhao_AIRSUN

25 days

If you’re excited by Tesla’s new world model, meet OmniNWM—our research take on panoramic, controllable driving world models • Ultra-long demos • precise camera control • RGB/semantics/depth/occupancy • intrinsic closed-loop rewards Arxiv: https://t.co/CKa5Bd9bqr Watch:

10

47

319

Hao Zhao

@HaoZhao_AIRSUN

1 month

InvRGB+L (ICCV’25): inverse rendering of large, dynamic scenes from a single RGB+LiDAR sequence. We add a specular LiDAR reflectance model + RGB↔LiDAR material consistency, yielding reliable albedo/roughness, relighting, night sim, and realistic object insertion. Paper:

2

20

140

Hao Zhao

@HaoZhao_AIRSUN

2 months

TA-VLA: Torque-aware Vision-Language-Action models We show how to inject force cues into VLAs for contact-rich manipulation. Three takeaways: ✅Where: put torque adapters in the decoder, not the encoder. ✅How: use a single-token summary of torque history. ✅Why: jointly

9

57

452

Hao Zhao

@HaoZhao_AIRSUN

4 months

Thrilled to share that our paper "OnePoseViaGen" received three strong accepts at #CoRL2025! 🚀🎉 Check it out 👉 https://t.co/U54kgzIXGy Congrats to the amazing team! 🔥

github.com

[CORL 2025 Oral]One View, Many Worlds: Single-Image to 3D Object Meets Generative Domain Randomization for One-Shot 6D Pose Estimation. - GZWSAMA/OnePoseviaGen

Chongjie(CJ) Ye

@ychngji6

4 months

Excited to share that our RGB-D version of OnePoseViaGen has been accepted to #CoRL2025 with three strong accepts! 🤖 Code coming soon at https://t.co/1D2sT9l9P2 6D pose estimation is crucial for robotics, but generalizing to generated objects remains challenging. We introduce

0

1

15

Hao Zhao

@HaoZhao_AIRSUN

4 months

🚀 Mind-blowing! OnePoseViaGen can track 6D object poses directly from any input video – no special setup needed! From just an RGB video 🎥, it reconstructs 2D depth, 3D shape, 4D point dynamics, and 6D pose. 👉 Try it here:

huggingface.co

Chongjie(CJ) Ye

@ychngji6

4 months

🚀We are thrilled to release an alpha version of OnePoseViaGen - a system for Panoptic 4D Scene Reconstruction from RGB video! 👉Try it at : https://t.co/RFKddVaVCV #AI #3D #GenerativeAI

1

14

58

Hao Zhao

@HaoZhao_AIRSUN

4 months

🚀 New code release! We’ve open-sourced DiST-4D — the first feed-forward world model that simultaneously handles temporal prediction and spatial novel-view synthesis for autonomous driving scenes. • Disentangled spatio-temporal diffusion • Metric-depth bridge for 4D RGB-D

0

9

21

Kosta Derpanis (sabbatical @ CMU)

@CSProfKGD

5 months

The legend Takeo Kanade

6

46

387

Hao Zhao

@HaoZhao_AIRSUN

5 months

🙌 Great to see @dreamingtulpa showcasing SyncTalk++! We’ve level-led up the original NeRF-based SyncTalk with 3D Gaussian Splatting: • Dynamic Portrait Renderer for sharper, consistent identity • FaceSync + HeadSync for spot-on lip & pose alignment • 101 fps real-time

Dreaming Tulpa 🥓👑

@dreamingtulpa

5 months

is it over yet?

2

16

48

Hao Zhao

@HaoZhao_AIRSUN

5 months

Photometric stereo meets VGGT: LINO leverages geometry backbones + light register tokens to deliver universal, 4K-detailed normal maps under arbitrary lighting. 👀 Thanks for the post @zhenjun_zhao

Zhenjun Zhao

@zhenjun_zhao

5 months

Light of Normals: Unified Feature Representation for Universal Photometric Stereo Hong Li, Houyuan Chen, @ychngji6, @Frozen_Burning, Bohan Li, @xshocng1, Xianda Guo, Xuhui Liu, Yikai Wang, Baochang Zhang, Satoshi Ikehata, Boxin Shi, @raoanyi, @HaoZhao_AIRSUN tl;dr: learnable

0

13

46

Hao Zhao

@HaoZhao_AIRSUN

5 months

LINO = VGGT + Learnable Light Tokens + Detail-Aware Losses 🔥 Huge thanks to @raoanyi @chen_yuan76802 — loved building this together! Project:

houyuanchen111.github.io

TWITTER BANNER DESCRIPTION META TAG

Anyi Rao

@raoanyi

5 months

Universal Photometric Stereo (PS) aims for robust normal maps under any light. 🚨 But big hurdles remain! 1️⃣ Deep coupling: Ambiguous intensity - is it the light changing or the surface turning? 🤔 2️⃣ Detail loss: Complex surfaces (shadows, inter-reflections, fine details) stump

0

15

53

Hao Zhao

@HaoZhao_AIRSUN

5 months

Combining VGGT with lighting registers gives rise to today’s strongest foundation model for photometric stereo. Thanks @_akhaliq for highlighting our work on LINO: predicting ultra-detailed 4K normal maps from unified features! 👀

AK

@_akhaliq

5 months

Light of Normals Unified Feature Representation for Universal Photometric Stereo

1

10

64

Huazhe Harry Xu

@HarryXu12

5 months

Just arrived LA! We will present 4 papers @RoboticsSciSys including award candidate paper Reactive Diffusion Policy @HanXue012 , DemoGen @ZhengrongX , DoGlove @DoubleHan07 , and Morpheus @HaoZhao_AIRSUN robot face. I'll also share thoughts about OOD generalization in workshops.

1

9

75

Hao Zhao

@HaoZhao_AIRSUN

5 months

🚗📡 Simulate Any Radar (SA-Radar) is here! We present a controllable, efficient, and realistic radar simulation system via waveform-parameterized attribute embedding. 🌐 Supports: • Cross-sensor simulation • Attribute editing • Scene augmentation • RAD cube generation 🔍

0

1

4

Hao Zhao

@HaoZhao_AIRSUN

5 months

🤖 Just published in Nature Machine Intelligence! F-TAC Hand embeds high-res touch (0.1 mm) across 70% of a biomimetic robotic hand — enabling adaptive, human-like grasping across 600 real-world trials.

Zihang Zhao

@ZIHANGZHAO2

5 months

Our paper, published in Nature Machine Intelligence, presents a system with full-hand tactile sensing and sensory-motor feedback for adaptive, human-like grasping, advancing embodied agents in real-world operation. Article: https://t.co/LwN7ZzyOfx Demo: https://t.co/rhi0cztkdp

0

1

8

Hao Zhao

@HaoZhao_AIRSUN

6 months

🤖 Meet Morpheus — a neural-driven animatronic face that doesn’t just talk, it feels. Hybrid actuation (rigid 💪 + tendon 🧵) makes it expressive and compact. Self-modeling + audio-to-blendshape = real-time emotional reactions 😮‍💨😠🥹 🧠💬 Watch it smile, frown, cringe... all

2

8

20

Hao Zhao

@HaoZhao_AIRSUN

6 months

🚀 Looking forward to presenting three works at #CVPR2025 next week! Come check them out if you're interested in dynamic modeling, 3D physical reasoning, or generative driving scenes: 🔹 PartRM: Modeling Part-Level Dynamics with Large Cross-State Reconstruction Model 👉

github.com

Contribute to GasaiYU/PartRM development by creating an account on GitHub.

#CVPR2026

@CVPR

6 months

We are one week away from #computervision’s largest conference #CVPR2025! 🤗 What are you most excited to see?

0

1

5

Hao Zhao

@HaoZhao_AIRSUN

6 months

🚘 Tsinghua & Bosch just dropped Impromptu VLA: the SOTA fully open-source, end-to-end Vision-Language-Action model for autonomous driving — no BEV, no planner, just raw video ➝ natural language ➝ action. Beats BridgeAD on NeuroNCAP (2.15 vs. 1.60). 👉 https://t.co/Ub2jisUnKG

0

22

101

Hao Zhao

@HaoZhao_AIRSUN

6 months

We propose **Challenger**—a framework to generate **photorealistic adversarial driving videos**! ⚠️🚗 - 🚗 Diverse scenarios: **cut-ins, tailgating, blocking**, without human supervision - 💥 **8.6× to 26.1×** higher collision rates for SOTA AD models - 🎯 **Transferable**

1

7

22

Hao Zhao

@HaoZhao_AIRSUN

8 months

The physical realism here is wild. PhysGen3D easily beats most closed-source stuff (Kling, Runway, Pika…) when it comes to physics-informed generation. 🚀 #CVPR2025

Shenlong Wang

@ShenlongWang

8 months

Can we make images interactive with realistic physics? 🚀 Thrilled to share our #CVPR2025 work: PhysGen3D! From just a single image, PhysGen3D creates an interactive, physics-informed 3D scene, enabling us to explore and simulate realistic future scenarios interactively.

1

20

69