@BerntBornich
Bernt Bornich
1 year
Lot’s of great hype about the large multimodal models right now. But how do we get to trillions of tokens for embodied actions? My bet is VR-Teleoperation and shared autonomy.
13
25
157

Replies

@BerntBornich
Bernt Bornich
1 year
1
0
0
@tom_doerr
Tom Dörr
1 year
@BerntBornich Why not pose estimation on videos?
1
0
1
@BerntBornich
Bernt Bornich
1 year
@tom_doerr Contact dynamics and preferred impedance etc is poorly represented in simulation and only indirectly in video. Physical interactions hide alot more complexity than what is obvious on the surface.
1
0
2
@chris_j_paxton
Chris Paxton
1 year
@BerntBornich I think this is way tougher than it seems. For the chatgpt approach to work, we need lots of varied data in different environments. Obvious privacy + legal concerns might stop any commercial teleop system from achieving this
1
0
1
@BerntBornich
Bernt Bornich
1 year
@chris_j_paxton Not in anyway straightforward. Huge challenges to overcome with respect to privacy and legal. Not to mention safety. But as we are increasingly able to deploy droids with base autonomy+teleop I think it has a path to collecting the diversity of data needed.
0
0
1
@HDPbilly
HDP
1 year
@BerntBornich Yep… or… cameras and sensor suits on actual laborers… probably cheaper and better
1
0
0
@BerntBornich
Bernt Bornich
1 year
@HDPbilly As long as you don't loose money on deploying the droids, by having some base autonomy and good teleop to covers rest, you get cleaner data and a flywheel. Repeatedly deploying to the fleet to get the full evaluation loop requires a good fleet size.
0
0
0
@paravn
Parav
1 year
@BerntBornich Teleoperation is hard to scale up. Also, how do you generalize tokens from one hardware stack to another?
0
0
2
@PeterVilleroy
Peter
1 year
@BerntBornich Tell me more!
0
0
0
@dhanushisrad
Dhanush
1 year
@BerntBornich When android manufacturing starts scaling up fast, relying only on VR teleoperation to reach trillions of sensorimotor tokens will be expensive and slow. The simulator will help, but I think the key will be to amplify a tiny number of teleop demos with online RL for millions of…
1
0
1
@duncancalvert
Duncan Calvert
1 year
@BerntBornich @1x__tech If you use deep learning to correspond semantics with detected 3D environment features, you could possibly plug the semantics into LLMs, like "Approach the blue door with the weird handle."
0
0
0
@mataslauzadis
Matas Lauzadis
1 year
@BerntBornich Robotics companies should be partnering with factories to put motion capture devices on their workers
0
0
0
@utopiah
Fabien Benetou
1 year
@BerntBornich Getting there 😅
@utopiah
Fabien Benetou
1 year
It moves! Next step remote control then #WebXR .
2
1
13
0
0
0
@Yang_Supertramp
Yang Fan 范阳
1 year
@BerntBornich Is it lululemon or UnderArmor he is wearing?
0
0
0