Yixuan Wang Profile
Yixuan Wang

@YXWangBot

Followers
1K
Following
1K
Media
44
Statuses
177

CS Ph.D. student @Columbia working on robotics | Worked at Boston Dynamics AI Institute, Google X #Vision #Robotics #Learning

New York, USA
Joined October 2019
Don't wanna be here? Send us removal request.
@YXWangBot
Yixuan Wang
6 months
🤔Active robot exploration is critical but hard – long-horizon, large space, and complex occlusions. How can robot explore like human?.🤖Introducing CuriousBot, which interactively explores and builds actionable 3D relational object graph. 🔗👇Threads(1/9)
2
6
39
@YXWangBot
Yixuan Wang
7 days
RT @shivanshpatel35: 🚀 Introducing RIGVid: Robots Imitating Generated Videos!.Robots can now perform complex tasks—pouring, wiping, mixing—….
0
31
0
@YXWangBot
Yixuan Wang
16 days
RT @YunzhuLiYZ: Had a great time yesterday giving three invited talks at #RSS2025 workshops—on foundation models, structured world models,….
0
10
0
@YXWangBot
Yixuan Wang
18 days
Just arrived at LA and excited to be at RSS! I will present CodeDiffuser at following sessions:.- Presentation on June 22 (Sun.) 5:30 PM - 6:30 PM.- Poster on June 22 (Sun.) 6:30 PM - 8:00 PM. I will also present CuriousBot at.- FM4RoboPlan Workshop on June 21 (Sat.) 9:40 - 10:10.
@YXWangBot
Yixuan Wang
19 days
🤖 Does VLA models really listen to language instructions? Maybe not 👀.🚀 Introducing our RSS paper: CodeDiffuser -- using VLM-generated code to bridge the gap between **high-level language** and **low-level visuomotor policy**.🎮 Try the live demo: (1/9)
0
0
11
@YXWangBot
Yixuan Wang
19 days
RT @YunzhuLiYZ: We’ve been exploring 3D world models with the goal of finding the right recipe that is both:.(1) structured—for sample effi….
0
11
0
@YXWangBot
Yixuan Wang
19 days
RT @robo_kat: How can we achieve both common sense understanding that can deal with varying levels of ambiguity in language and dextrous ma….
0
5
0
@YXWangBot
Yixuan Wang
19 days
RT @kaiwynd: Check out the cool results and demo!.
0
1
0
@YXWangBot
Yixuan Wang
19 days
Two releases in a row from our lab today 😆. One problem I was always pondering on is how to use structured representation while making it scalable. Super excited that Kaifeng's work pushes this direction forward and I cannot wait to see what's more in the future!!.
@kaiwynd
Kaifeng Zhang
19 days
Can we learn a 3D world model that predicts object dynamics directly from videos? . Introducing Particle-Grid Neural Dynamics: a learning-based simulator for deformable objects that trains from real-world videos. Website: ArXiv:
0
0
9
@YXWangBot
Yixuan Wang
19 days
RT @YunzhuLiYZ: **Steerability** remains one of the key issues for current vision-language-action models (VLAs). Natural language is often….
0
23
0
@YXWangBot
Yixuan Wang
19 days
RT @Haoyu_Xiong_: It is cool to see that you can steer your low-level policy with foundation models. Check out new work from @YXWangBot.
0
1
0
@YXWangBot
Yixuan Wang
19 days
RT @RoboPapers: Ep#10 with @RogerQiu_42 on Humanoid Policy ~ Human Policy . Co-hosted by @chris_j_paxton & @micoolc….
0
6
0
@YXWangBot
Yixuan Wang
19 days
This work is done with awesome Yitong and Guang! Thanks to the amazing collaborators Dale, Paarth, Kuni, @huan_zhang12, and Katherine for their supports and contributions! Also a huge thanks for my incredible advisor @YunzhuLiYZ for the support and guidance as always!.
0
1
2
@YXWangBot
Yixuan Wang
19 days
Links:.🌐 Website: 🖥️ Code: 📺 Video: We will present the paper on June 22 (Friday) from 5:30 PM to 6:30 PM, and poster from 6:30 PM to 8:00 PM. See you in LA! (9/9)
1
1
5
@YXWangBot
Yixuan Wang
19 days
Our tasks involve contact-rich manipulation and multi-object interactions, enabled by the visuomotor policy learned from demonstrations. To stow the book, the robot first squeezes in the book, pushes other books aside to find space, and finally inserts the book. (8/9)
1
0
1
@YXWangBot
Yixuan Wang
19 days
Geometric relations are also important for manipulation tasks. Our system can also capture geometric relations, as evidenced by the following battery packing task. The user can specify the target battery by its relative location, such as “frontmost” or “right column”. (7/9)
1
0
1
@YXWangBot
Yixuan Wang
19 days
Moreover, our framework can understand more fine-grained semantic information, such as names on the book cover, and select the right book instance to stow books. (6/9)
1
0
1
@YXWangBot
Yixuan Wang
19 days
Our system can interface with natural language, even with contextual description and self-correction structure. The following video showcases how it can localize the right instance and accomplish the hanging mug task. (5/9)
1
2
1
@YXWangBot
Yixuan Wang
19 days
The following video shows that when the instruction is more specific, our code can 1) align with the high-level instruction by selecting the correct instance and 2) steer the low-level visuomotor policy using the computed 3D attention maps. (4/9)
1
1
2
@YXWangBot
Yixuan Wang
19 days
We use code as an intermediate representation to connect high-level instructions with low-level actions, which is both interpretable and executable. The video below shows how we use code to compute 3D attention maps from language instructions. (3/9)
1
1
2
@YXWangBot
Yixuan Wang
19 days
Language can be ambiguous to specify the task, and different levels of ambiguity can lead to different task complexity, as shown below. CodeDiffusersystematically evaluates visuomotor policy’s steerability under ambiguous instructions, a factor overlooked in prior works. (2/9)
1
2
1
@YXWangBot
Yixuan Wang
19 days
🤖 Does VLA models really listen to language instructions? Maybe not 👀.🚀 Introducing our RSS paper: CodeDiffuser -- using VLM-generated code to bridge the gap between **high-level language** and **low-level visuomotor policy**.🎮 Try the live demo: (1/9)
1
26
126