Jensen Gao @jensen_gao X Profile

Jensen Gao

@jensen_gao

Followers

145

Following

10

Media

12

Statuses

17

CS PhD Student @StanfordAILab | Previously BS/MS @Berkeley_EECS

Stanford, CA

Joined April 2022

Don't wanna be here? Send us removal request.

Jensen Gao

@jensen_gao

2 days

RT @_abraranwar: Are current eval/deployment practices enough for today’s robot policies?. Announcing the Eval&Deploy workshop at CoRL 202….

0

13

0

Jensen Gao

@jensen_gao

1 year

Paper: Website: Thanks to collaborators Annie Xie, @xiao_ted, @chelseabfinn, @DorsaSadigh.(8/8).

arxiv.org

Data collection has become an increasingly important problem in robotic manipulation, yet there still lacks much understanding of how to effectively collect data to facilitate broad...

0

8

Jensen Gao

@jensen_gao

1 year

We also show that with composition, we can transfer policies to entirely new settings (kitchens) with unseen combinations of many environmental factors. Policies trained on data collected without environmental variation, or without prior robot data, fail to transfer well. (7/8)

1

4

Jensen Gao

@jensen_gao

1 year

In our real experiments, we find that policies often do achieve composition when trained on data from our strategies. Importantly, we find that using large prior robot datasets (in our case, BridgeData V2: is critical for strengthening this. (6/8)

1

0

3

Jensen Gao

@jensen_gao

1 year

If policies can compose these factors, we can exploit this during data collection. We propose strategies (Diagonal, L, Stair) that collect data (green) while prioritizing covering individual factors, such that composition could address their unseen combinations (pink). (5/8)

1

0

1

Jensen Gao

@jensen_gao

1 year

For example, consider the task of putting a fork in a container. Even for this relatively simple task, there can be considerable variation along multiple axes, such as the type of fork, or the table height. We conduct real experiments where we extensively vary such factors. (4/8)

1

0

2

Jensen Gao

@jensen_gao

1 year

However, training robot policies that can handle a wide variety of settings remains difficult. Data-driven methods have promise to scale and deal with this variation, but collecting robot data covering all desired combinations of environmental factors is often infeasible. (3/8).

1

0

2

Jensen Gao

@jensen_gao

1 year

Robotic tasks can vary in many ways, and handling them all can be challenging. Recent works (e.g., OXE: have scaled robot datasets to cover a diverse variety of environmental factors, such as those studied in the COLOSSEUM: (2/8).

robotics-transformer-x.github.io

Project page for Open X-Embodiment: Robotic Learning Datasets and RT-X Models

1

0

5

Jensen Gao

@jensen_gao

1 year

How should we efficiently collect robot data for generalization? We propose data collection procedures guided by the abilities of policies to compose environmental factors in their data. Policies trained with data from our procedures can transfer to entirely new settings. (1/8)

1

14

66

Jensen Gao

@jensen_gao

2 years

Thanks to amazing collaborators: @bidiptas13 @xf1280 @xiao_ted @jiajunwu_cs @brian_ichter @Majumdar_Ani @DorsaSadigh.Paper: Website: Talk:

drive.google.com

0

1

8

Jensen Gao

@jensen_gao

2 years

Finally, we evaluate our planner on real robot scenes, where we again find that PG-InstructBLIP significantly improves performance, succeeding on 9/10 tasks compared to 4/10 for the base VLM. (7/8)

1

7

Jensen Gao

@jensen_gao

2 years

To show the benefits of improved physical reasoning for robotics, we incorporate our VLM into an LLM-based planner and evaluate on 51 tasks across 8 diverse real world scenes. Using PG-InstructBLIP instead of the base VLM improves planning accuracy from 56.9% to 88.2%. (6/8)

1

0

7

Jensen Gao

@jensen_gao

2 years

We use PhysObjects to fine-tune InstructBLIP, a SOTA open-source VLM, to create PG-InstructBLIP. This significantly improves the VLM on our dataset, and slightly outperforms single concept fine-tuning, suggesting some transfer benefits from multi-concept VLM fine-tuning. (5/8)

1

0

6

Jensen Gao

@jensen_gao

2 years

We use image data from EgoObjects: which consists of frames from egocentric video of objects from a wide variety of real household settings. We believe this makes PhysObjects particularly relevant for household robotics applications. (4/8)

1

6

Jensen Gao

@jensen_gao

2 years

To address this limitation, we propose PhysObjects, a dataset intended for fine-tuning VLMs to better capture object-centric physical reasoning. It consists of annotations for 8 different continuous and categorical physical concepts that are broadly relevant for robotics. (3/8)

1

0

6

Jensen Gao

@jensen_gao

2 years

VLMs are rapidly improving and have shown promise for grounding in robotics, but aren’t great at obtaining detailed info about objects in a scene. For example, when asking them about the material and contents of different cups, they sometimes work, but are often wrong. (2/8)

1

7

Jensen Gao

@jensen_gao

2 years

Excited to release PhysObjects: a dataset of 36.9K human and 417K automatic physical concept annotations for images of common household objects. We use it to fine-tune VLMs to improve their physical reasoning, and leverage this for better robotics. (1/8)

2

16

101