Jiageng Mao Profile
Jiageng Mao

@PointsCoder

Followers
563
Following
259
Media
10
Statuses
68

PhD Student @ USC CS

Los Angeles, CA
Joined July 2021
Don't wanna be here? Send us removal request.
@PointsCoder
Jiageng Mao
3 days
πŸŽ₯ Video Generation Enables Zero-Shot Robotic Manipulation πŸ€– Introducing PhysWorld, a framework that bridges video generation and robot learning through (generated) real-to-sim world modeling. 🌐 Project: https://t.co/9mRqPqr5TS πŸ“„ Paper: https://t.co/wmkEpmUGhq πŸ’» Code:
7
40
174
@JieWang_ZJUI
Jie Wang
2 days
Good way to use video-generation world model in robotics
@PointsCoder
Jiageng Mao
3 days
πŸŽ₯ Video Generation Enables Zero-Shot Robotic Manipulation πŸ€– Introducing PhysWorld, a framework that bridges video generation and robot learning through (generated) real-to-sim world modeling. 🌐 Project: https://t.co/9mRqPqr5TS πŸ“„ Paper: https://t.co/wmkEpmUGhq πŸ’» Code:
0
2
4
@cyberne7ic
Cybernetic (πŸ€–/acc)
2 days
Generative video is becoming a new form of simulation. PhysWorld links video generation with robot learning, turning visual synthesis into real-to-sim modeling where zero-shot manipulation starts to emerge.
@PointsCoder
Jiageng Mao
3 days
πŸŽ₯ Video Generation Enables Zero-Shot Robotic Manipulation πŸ€– Introducing PhysWorld, a framework that bridges video generation and robot learning through (generated) real-to-sim world modeling. 🌐 Project: https://t.co/9mRqPqr5TS πŸ“„ Paper: https://t.co/wmkEpmUGhq πŸ’» Code:
0
7
8
@yuewang314
Yue Wang
2 days
How can we turn a generated video into a robotic demonstration? Check out @PointsCoder 's recent work PhysWorld. We also open-sourced the whole pipeline that hopefully can make real-to-sim simpler.
@PointsCoder
Jiageng Mao
3 days
πŸŽ₯ Video Generation Enables Zero-Shot Robotic Manipulation πŸ€– Introducing PhysWorld, a framework that bridges video generation and robot learning through (generated) real-to-sim world modeling. 🌐 Project: https://t.co/9mRqPqr5TS πŸ“„ Paper: https://t.co/wmkEpmUGhq πŸ’» Code:
0
8
59
@Kevin_SSY
Shuyang (Kevin) Sun
3 days
Great work from our student researcher Jiageng Mao @PointsCoder to enable scalable robot learning by imitating AI-generated videos.
@PointsCoder
Jiageng Mao
3 days
πŸŽ₯ Video Generation Enables Zero-Shot Robotic Manipulation πŸ€– Introducing PhysWorld, a framework that bridges video generation and robot learning through (generated) real-to-sim world modeling. 🌐 Project: https://t.co/9mRqPqr5TS πŸ“„ Paper: https://t.co/wmkEpmUGhq πŸ’» Code:
0
2
9
@jonstephens85
Jonathan Stephens
3 days
This is the most impressive world model -> physical AI training project I have seen published. I know world models are going to be a large part of solving the simulation data gap, this this really puts all of the pieces together. #Robotics #Simulation
@PointsCoder
Jiageng Mao
3 days
πŸŽ₯ Video Generation Enables Zero-Shot Robotic Manipulation πŸ€– Introducing PhysWorld, a framework that bridges video generation and robot learning through (generated) real-to-sim world modeling. 🌐 Project: https://t.co/9mRqPqr5TS πŸ“„ Paper: https://t.co/wmkEpmUGhq πŸ’» Code:
4
7
79
@PointsCoder
Jiageng Mao
3 days
This work is led by Jiageng as a student researcher project @GoogleDeepMind , collaborated with @SichengHe12345 , Hao-Ning Wu, Yang You, @Kevin_SSY, Zhicheng Wang, Yanan Bao, Huizhong Chen, @GuibasLeonidas, @vitorguizilini, and @howardzzh, and is advised by @yuewang314.
0
0
6
@PointsCoder
Jiageng Mao
3 days
πŸ” What did we find? By coupling video generation with physical world modeling, PhysWorld transforms purely visual signals into physically feasible actions. βœ… Enables zero-shot real-world manipulation βœ… Improves success rate by +15% over prior video-imitation methods βœ…
0
0
5
@PointsCoder
Jiageng Mao
3 days
πŸ’‘ What is PhysWorld? PhysWorld enables robots to learn manipulation skills without real-world demonstrations. Given just one image and a task prompt, it: 1️⃣ Generates a task-conditioned video showing how to complete the task 2️⃣ Reconstructs a physically interactable 3D scene
0
2
3
@cheryyun_l
Yongyuan Liang
10 days
Unified multimodal models can generate text and images, but can they truly reason across modalities? 🎨 Introducing ROVER, the first benchmark that evaluates reciprocal cross-modal reasoning in unified models, the next frontier of omnimodal intelligence. 🌐 Project:
5
29
236
@PointsCoder
Jiageng Mao
24 days
Emily is presenting her first-ever paper at #ICCV2025. Welcome to come and have a chat with her!
@JiaEmily84473
Emily Jia
24 days
πŸš€ Excited to share our #ICCV2025 paper(@yuewang314 @PointsCoder ): "Learning an Implicit Physics Model for Image-based Fluid Simulation" 🌊 We present a physics-informed neural network that generates 4D, physically consistent fluid animations from a single image β€” guided by
0
0
7
@UUUUUsher
Quankai Gao
29 days
πŸš€ Introducing InstantSfM: Fully Sparse and Parallel Structure-from-Motion. βœ… Python + GPU-optimized implementation, no C++ anymore! βœ… 40Γ— faster than COLMAP with 5K images on single GPU! βœ… Scales beyond 100 images (more than VGGT/VGGSfM can consume)! βœ… Support metric scale.
5
47
351
@PointsCoder
Jiageng Mao
1 month
Check out our new humanoid whole-body manipulation dataset!
@zhenyuzhao123
Zhenyu Zhao
1 month
Introducing πŸš€ Humanoid Everyday β€” a large, real-world dataset for humanoid whole-body manipulation. Unlike most humanoid data (fixed bases, narrow tasks), ours covers diverse, locomotion-integrated skills. πŸ”— Website: https://t.co/0wmXltt13R πŸ“„ Paper: https://t.co/lt8V6HZIO3
1
3
38
@PointsCoder
Jiageng Mao
2 months
Check out our work on leveraging Internet images for robotic msnipulation!
@SihengZhao
Siheng Zhao
2 months
(1/n) Ever wondered if a single in-the-wild image could generate photorealistic robotic demonstrations? πŸ–ΌοΈ πŸ”₯Excited to share our #CoRL2025 paper, Robot Learning from Any Images (RoLA), a framework that transforms any in-the-wild image into an interactive, physics-enabled
0
0
7
@yuewang314
Yue Wang
3 months
πŸš€ Join Us: Research Internships in Embodied Intelligence The USC Geometry, Vision, and Learning Lab ( https://t.co/MP3PFbYx2L) is seeking highly motivated interns to push the frontiers of AI, robotics, and 3D computer vision. You’ll work on large-scale VLA models,
7
25
191
@PointsCoder
Jiageng Mao
9 months
This project is co-led by our incredible intern Wei Chow and me, and I am especially grateful to my advisor, @yuewang314 , for his invaluable guidance and support throughout this work. πŸ™ We also deeply appreciate the contributions and insights of @Boyiliee, @DanielSeita, and
0
0
9
@PointsCoder
Jiageng Mao
9 months
How do we fix this? Introducing PhysAgent πŸš€ – a new framework that enhances VLMs by integrating: πŸ”Ή Vision foundation models (Depth, SAM, GroundingDINO) πŸ”Ή A physics knowledge memory for improved reasoning πŸ”Ή Chain-of-thought inference for self-verification PhysAgent boosts
1
1
14
@PointsCoder
Jiageng Mao
9 months
What did we find? 🧐 We evaluated 75 top VLMs, including GPT-4o, Gemini, and open-source models, and found: βœ… Strong commonsense reasoning but poor physical reasoning βœ… Closed-source models outperform open-source ones, but still struggle βœ… Scaling data and model size does not
1
2
13
@PointsCoder
Jiageng Mao
9 months
What is PhysBench? PhysBench is a comprehensive benchmark with 10,002 video-image-text entries that assess VLMs across four major domains: 1️⃣ Physical object properties (number, mass, stiffness, elasticity, etc.) 2️⃣ Physical object relationships (distances, depths, velocities,
1
0
11
@PointsCoder
Jiageng Mao
9 months
Can Vision-Language Models (VLMs) truly understand the physical world? πŸŒπŸ”¬ Introducing PhysBench – the first benchmark to evaluate VLMs’ understanding of physics! PhysBench is accepted to #ICLR2025 as an Oral presentation (only 1.8% out of 11k submissions)! 🌐 Project:
5
72
413