Hao Zhao
@HaoZhao_AIRSUN
Followers
608
Following
959
Media
11
Statuses
58
https://t.co/l52lTpdO16 Computer vision is good, have fun.
Tsinghua University
Joined July 2024
๐ GaussianArt (3DV 2026) is here! A single-stage unified geometryโmotion model that finally scales articulated reconstruction to 20+ parts with order-of-magnitude higher accuracy. Evaluated on MPArt-90, the largest articulated benchmark to date. Code + project page below ๐ ๐
1
23
158
If youโre excited by Teslaโs new world model, meet OmniNWMโour research take on panoramic, controllable driving world models โข Ultra-long demos โข precise camera control โข RGB/semantics/depth/occupancy โข intrinsic closed-loop rewards Arxiv: https://t.co/CKa5Bd9bqr Watch:
10
47
319
InvRGB+L (ICCVโ25): inverse rendering of large, dynamic scenes from a single RGB+LiDAR sequence. We add a specular LiDAR reflectance model + RGBโLiDAR material consistency, yielding reliable albedo/roughness, relighting, night sim, and realistic object insertion. Paper:
2
20
140
TA-VLA: Torque-aware Vision-Language-Action models We show how to inject force cues into VLAs for contact-rich manipulation. Three takeaways: โ
Where: put torque adapters in the decoder, not the encoder. โ
How: use a single-token summary of torque history. โ
Why: jointly
9
57
452
Thrilled to share that our paper "OnePoseViaGen" received three strong accepts at #CoRL2025! ๐๐ Check it out ๐ https://t.co/U54kgzIXGy Congrats to the amazing team! ๐ฅ
github.com
[CORL 2025 Oral]One View, Many Worlds: Single-Image to 3D Object Meets Generative Domain Randomization for One-Shot 6D Pose Estimation. - GZWSAMA/OnePoseviaGen
Excited to share that our RGB-D version of OnePoseViaGen has been accepted to #CoRL2025 with three strong accepts! ๐ค Code coming soon at https://t.co/1D2sT9l9P2 6D pose estimation is crucial for robotics, but generalizing to generated objects remains challenging. We introduce
0
1
15
๐ Mind-blowing! OnePoseViaGen can track 6D object poses directly from any input video โ no special setup needed! From just an RGB video ๐ฅ, it reconstructs 2D depth, 3D shape, 4D point dynamics, and 6D pose. ๐ Try it here:
huggingface.co
๐We are thrilled to release an alpha version of OnePoseViaGen - a system for Panoptic 4D Scene Reconstruction from RGB video! ๐Try it at : https://t.co/RFKddVaVCV
#AI #3D #GenerativeAI
1
14
58
๐ New code release! Weโve open-sourced DiST-4D โ the first feed-forward world model that simultaneously handles temporal prediction and spatial novel-view synthesis for autonomous driving scenes. โข Disentangled spatio-temporal diffusion โข Metric-depth bridge for 4D RGB-D
0
9
21
๐ Great to see @dreamingtulpa showcasing SyncTalk++! Weโve level-led up the original NeRF-based SyncTalk with 3D Gaussian Splatting: โข Dynamic Portrait Renderer for sharper, consistent identity โข FaceSync + HeadSync for spot-on lip & pose alignment โข 101 fps real-time
2
16
48
Photometric stereo meets VGGT: LINO leverages geometry backbones + light register tokens to deliver universal, 4K-detailed normal maps under arbitrary lighting. ๐ Thanks for the post @zhenjun_zhao
Light of Normals: Unified Feature Representation for Universal Photometric Stereo Hong Li, Houyuan Chen, @ychngji6, @Frozen_Burning, Bohan Li, @xshocng1, Xianda Guo, Xuhui Liu, Yikai Wang, Baochang Zhang, Satoshi Ikehata, Boxin Shi, @raoanyi, @HaoZhao_AIRSUN tl;dr: learnable
0
13
46
LINO = VGGT + Learnable Light Tokens + Detail-Aware Losses ๐ฅ Huge thanks to @raoanyi @chen_yuan76802 โ loved building this together! Project:
houyuanchen111.github.io
TWITTER BANNER DESCRIPTION META TAG
Universal Photometric Stereo (PS) aims for robust normal maps under any light. ๐จ But big hurdles remain! 1๏ธโฃ Deep coupling: Ambiguous intensity - is it the light changing or the surface turning? ๐ค 2๏ธโฃ Detail loss: Complex surfaces (shadows, inter-reflections, fine details) stump
0
15
53
Combining VGGT with lighting registers gives rise to todayโs strongest foundation model for photometric stereo. Thanks @_akhaliq for highlighting our work on LINO: predicting ultra-detailed 4K normal maps from unified features! ๐
1
10
64
Just arrived LA! We will present 4 papers @RoboticsSciSys including award candidate paper Reactive Diffusion Policy @HanXue012 , DemoGen @ZhengrongX , DoGlove @DoubleHan07 , and Morpheus @HaoZhao_AIRSUN robot face. I'll also share thoughts about OOD generalization in workshops.
1
9
75
๐๐ก Simulate Any Radar (SA-Radar) is here! We present a controllable, efficient, and realistic radar simulation system via waveform-parameterized attribute embedding. ๐ Supports: โข Cross-sensor simulation โข Attribute editing โข Scene augmentation โข RAD cube generation ๐
0
1
4
๐ค Just published in Nature Machine Intelligence! F-TAC Hand embeds high-res touch (0.1โฏmm) across 70% of a biomimetic robotic hand โ enabling adaptive, human-like grasping across 600 real-world trials.
Our paper, published in Nature Machine Intelligence, presents a system with full-hand tactile sensing and sensory-motor feedback for adaptive, human-like grasping, advancing embodied agents in real-world operation. Article: https://t.co/LwN7ZzyOfx Demo: https://t.co/rhi0cztkdp
0
1
8
๐ค Meet Morpheus โ a neural-driven animatronic face that doesnโt just talk, it feels. Hybrid actuation (rigid ๐ช + tendon ๐งต) makes it expressive and compact. Self-modeling + audio-to-blendshape = real-time emotional reactions ๐ฎโ๐จ๐ ๐ฅน ๐ง ๐ฌ Watch it smile, frown, cringe... all
2
8
20
๐ Looking forward to presenting three works at #CVPR2025 next week! Come check them out if you're interested in dynamic modeling, 3D physical reasoning, or generative driving scenes: ๐น PartRM: Modeling Part-Level Dynamics with Large Cross-State Reconstruction Model ๐
github.com
Contribute to GasaiYU/PartRM development by creating an account on GitHub.
We are one week away from #computervisionโs largest conference #CVPR2025! ๐ค What are you most excited to see?
0
1
5
๐ Tsinghua & Bosch just dropped Impromptu VLA: the SOTA fully open-source, end-to-end Vision-Language-Action model for autonomous driving โ no BEV, no planner, just raw video โ natural language โ action. Beats BridgeAD on NeuroNCAP (2.15 vs. 1.60). ๐ https://t.co/Ub2jisUnKG
0
22
101
We propose **Challenger**โa framework to generate **photorealistic adversarial driving videos**! โ ๏ธ๐ - ๐ Diverse scenarios: **cut-ins, tailgating, blocking**, without human supervision - ๐ฅ **8.6ร to 26.1ร** higher collision rates for SOTA AD models - ๐ฏ **Transferable**
1
7
22
The physical realism here is wild. PhysGen3D easily beats most closed-source stuff (Kling, Runway, Pikaโฆ) when it comes to physics-informed generation. ๐ #CVPR2025
Can we make images interactive with realistic physics? ๐ Thrilled to share our #CVPR2025 work: PhysGen3D! From just a single image, PhysGen3D creates an interactive, physics-informed 3D scene, enabling us to explore and simulate realistic future scenarios interactively.
1
20
69