
Shreyas Gite
@shreyasgite
Followers
1K
Following
3K
Media
197
Statuses
1K
Data & Sims at https://t.co/8SkahBzqk0. Prev: founded self-driving at Kopernikus (acq. @Ford), jet engines @RollsRoyce.
Berlin, Germany
Joined September 2009
Gemini + π0 = actually useful robots! (Similar to what @physical_int did with "Hi Robot"). I can now verbally tell the robot that I'm building a red Lego wall or wooden tower, and it will infer the next steps by itself and pass me the necessary pieces, tools, or materials, ha!
8
30
269
RT @RisingSayak: Had the honor to present diffusion transformers at CS25, Stanford. The place is truly magical. Slides: .
0
128
0
Five Golden Nuggets from this talk:. 1. Pretrained finetuned vs single task policy:.- Because of training on different tasks the pretrained policy has more recovery behaviours. - Somehow the visual-action mapping across different tasks and environments leads to this behaviour.
Wow, thanks Ted! I could spend a week on this video from @RussTedrake - easily one of the most dense learning material for anyone interested in Robotics.
0
13
72
Wow, thanks Ted! I could spend a week on this video from @RussTedrake - easily one of the most dense learning material for anyone interested in Robotics.
If you’re working on robotics and AI, the recent Stanford talk from @RussTedrake on scaling multitask robot manipulation is a mandatory watch, full stop. No marketing, no hype. Just solid hypothesis driven science, evidence backed claims. A gold mine in today’s landscape!
2
3
21
RT @fenildoshi009: 🧵 What if two images have the same local parts but represent different global shapes purely through part arrangement? Hu….
0
108
0
RT @svlevine: Warm-start RL (WSRL) can learn to control a real robot in under 20 minutes! Deep RL is getting really fast. Warm-start from o….
0
63
0
Some of components withheld from Genesis open-source sim:.1. Generative World Builder: Natural-language → scene graph → curriculum. LLM-driven pipeline that turns a text prompt (“crumpled towel on table, overhead cam”) into:.① a USD or other scene-graph, ② physics-ready.
Physics should not be programmed; it should be learned! .Meanwhile, until we get there -> Genesis is massive and awesome.
2
8
83
Came across curious phenomenon - Camera pose matters the most - be it normal policies, VLAs, or world models. Results from V-Jepa2 are consistent with Generalisation Gap paper. It kinda makes sense, as in end-to-end models: Camera viewpoint affects how movements appear visually
Not all augmentations are equal -> which visual nuisance variables actually cause robotic imitation-learning policies to fail? One of my fav papers from Annie Xie, Lisa Lee, Chelsea Finn & Ted Xao. The authors create a controlled benchmark that lets them toggle seven factors
0
0
3
RT @bilawalsidhu: This BlenderFusion paper basically says "screw trying to describe 3D edits through text" and just. use Blender :-) . Th….
0
44
0
If you are a curious person, you continue to change a lot irrespective of age. Also in terms of AI; why train a single model when you can have MoE with intertwined associative losses:D.
Normal people dating advice: Don’t marry early if you’re growing and changing a lot every year. AI buddy (@YiTayML): You are like a neural net in the middle of training and loss is still improving. Better to train to convergence instead of taking an early checkpoint snapshot.
0
0
2
RT @LerrelPinto: More concretely, we first train a visual encoder using BC, with semantic augmentations from VLMs, enabling robust scene un….
0
1
0
Wow! Meshcapade is one awesome company, still flying under the radar of many. .
Physical intelligence for humanist robots. At @meshcapade we've built the foundational technology for the capture, generation, and understanding of human motion. This blog post explains how this enables robot learning at scale.
1
2
16