Jiageng Mao
@PointsCoder
Followers
563
Following
259
Media
10
Statuses
68
PhD Student @ USC CS
Los Angeles, CA
Joined July 2021
π₯ Video Generation Enables Zero-Shot Robotic Manipulation π€ Introducing PhysWorld, a framework that bridges video generation and robot learning through (generated) real-to-sim world modeling. π Project: https://t.co/9mRqPqr5TS π Paper: https://t.co/wmkEpmUGhq π» Code:
7
40
174
Good way to use video-generation world model in robotics
π₯ Video Generation Enables Zero-Shot Robotic Manipulation π€ Introducing PhysWorld, a framework that bridges video generation and robot learning through (generated) real-to-sim world modeling. π Project: https://t.co/9mRqPqr5TS π Paper: https://t.co/wmkEpmUGhq π» Code:
0
2
4
Generative video is becoming a new form of simulation. PhysWorld links video generation with robot learning, turning visual synthesis into real-to-sim modeling where zero-shot manipulation starts to emerge.
π₯ Video Generation Enables Zero-Shot Robotic Manipulation π€ Introducing PhysWorld, a framework that bridges video generation and robot learning through (generated) real-to-sim world modeling. π Project: https://t.co/9mRqPqr5TS π Paper: https://t.co/wmkEpmUGhq π» Code:
0
7
8
How can we turn a generated video into a robotic demonstration? Check out @PointsCoder 's recent work PhysWorld. We also open-sourced the whole pipeline that hopefully can make real-to-sim simpler.
π₯ Video Generation Enables Zero-Shot Robotic Manipulation π€ Introducing PhysWorld, a framework that bridges video generation and robot learning through (generated) real-to-sim world modeling. π Project: https://t.co/9mRqPqr5TS π Paper: https://t.co/wmkEpmUGhq π» Code:
0
8
59
Great work from our student researcher Jiageng Mao @PointsCoder to enable scalable robot learning by imitating AI-generated videos.
π₯ Video Generation Enables Zero-Shot Robotic Manipulation π€ Introducing PhysWorld, a framework that bridges video generation and robot learning through (generated) real-to-sim world modeling. π Project: https://t.co/9mRqPqr5TS π Paper: https://t.co/wmkEpmUGhq π» Code:
0
2
9
This is the most impressive world model -> physical AI training project I have seen published. I know world models are going to be a large part of solving the simulation data gap, this this really puts all of the pieces together. #Robotics #Simulation
π₯ Video Generation Enables Zero-Shot Robotic Manipulation π€ Introducing PhysWorld, a framework that bridges video generation and robot learning through (generated) real-to-sim world modeling. π Project: https://t.co/9mRqPqr5TS π Paper: https://t.co/wmkEpmUGhq π» Code:
4
7
79
This work is led by Jiageng as a student researcher project @GoogleDeepMind , collaborated with @SichengHe12345 , Hao-Ning Wu, Yang You, @Kevin_SSY, Zhicheng Wang, Yanan Bao, Huizhong Chen, @GuibasLeonidas, @vitorguizilini, and @howardzzh, and is advised by @yuewang314.
0
0
6
π What did we find? By coupling video generation with physical world modeling, PhysWorld transforms purely visual signals into physically feasible actions. β
Enables zero-shot real-world manipulation β
Improves success rate by +15% over prior video-imitation methods β
0
0
5
π‘ What is PhysWorld? PhysWorld enables robots to learn manipulation skills without real-world demonstrations. Given just one image and a task prompt, it: 1οΈβ£ Generates a task-conditioned video showing how to complete the task 2οΈβ£ Reconstructs a physically interactable 3D scene
0
2
3
Unified multimodal models can generate text and images, but can they truly reason across modalities? π¨ Introducing ROVER, the first benchmark that evaluates reciprocal cross-modal reasoning in unified models, the next frontier of omnimodal intelligence. π Project:
5
29
236
Emily is presenting her first-ever paper at #ICCV2025. Welcome to come and have a chat with her!
π Excited to share our #ICCV2025 paper(@yuewang314 @PointsCoder ): "Learning an Implicit Physics Model for Image-based Fluid Simulation" π We present a physics-informed neural network that generates 4D, physically consistent fluid animations from a single image β guided by
0
0
7
π Introducing InstantSfM: Fully Sparse and Parallel Structure-from-Motion. β
Python + GPU-optimized implementation, no C++ anymore! β
40Γ faster than COLMAP with 5K images on single GPU! β
Scales beyond 100 images (more than VGGT/VGGSfM can consume)! β
Support metric scale.
5
47
351
Check out our new humanoid whole-body manipulation dataset!
Introducing π Humanoid Everyday β a large, real-world dataset for humanoid whole-body manipulation. Unlike most humanoid data (fixed bases, narrow tasks), ours covers diverse, locomotion-integrated skills. π Website: https://t.co/0wmXltt13R π Paper: https://t.co/lt8V6HZIO3
1
3
38
Check out our work on leveraging Internet images for robotic msnipulation!
(1/n) Ever wondered if a single in-the-wild image could generate photorealistic robotic demonstrations? πΌοΈ π₯Excited to share our #CoRL2025 paper, Robot Learning from Any Images (RoLA), a framework that transforms any in-the-wild image into an interactive, physics-enabled
0
0
7
π Join Us: Research Internships in Embodied Intelligence The USC Geometry, Vision, and Learning Lab ( https://t.co/MP3PFbYx2L) is seeking highly motivated interns to push the frontiers of AI, robotics, and 3D computer vision. Youβll work on large-scale VLA models,
7
25
191
This project is co-led by our incredible intern Wei Chow and me, and I am especially grateful to my advisor, @yuewang314 , for his invaluable guidance and support throughout this work. π We also deeply appreciate the contributions and insights of @Boyiliee, @DanielSeita, and
0
0
9
How do we fix this? Introducing PhysAgent π β a new framework that enhances VLMs by integrating: πΉ Vision foundation models (Depth, SAM, GroundingDINO) πΉ A physics knowledge memory for improved reasoning πΉ Chain-of-thought inference for self-verification PhysAgent boosts
1
1
14
What did we find? π§ We evaluated 75 top VLMs, including GPT-4o, Gemini, and open-source models, and found: β
Strong commonsense reasoning but poor physical reasoning β
Closed-source models outperform open-source ones, but still struggle β
Scaling data and model size does not
1
2
13
What is PhysBench? PhysBench is a comprehensive benchmark with 10,002 video-image-text entries that assess VLMs across four major domains: 1οΈβ£ Physical object properties (number, mass, stiffness, elasticity, etc.) 2οΈβ£ Physical object relationships (distances, depths, velocities,
1
0
11
Can Vision-Language Models (VLMs) truly understand the physical world? ππ¬ Introducing PhysBench β the first benchmark to evaluate VLMsβ understanding of physics! PhysBench is accepted to #ICLR2025 as an Oral presentation (only 1.8% out of 11k submissions)! π Project:
5
72
413