Xiao Ma
@yusufma555
Followers
1K
Following
287
Media
22
Statuses
93
Staff Research Scientist @ ByteDance Seed, working on robot foundation models. Prev: @Dyson @NUSingapore @sjtu1896. All views are my own.
Singapore & Beijing
Joined August 2015
I've been working on deformable object manipulation since my PhD. It was totally a nightmare years ago and my PhD advisor was telling me not to work on it for my own good. Today, at ByteDance Seed, we are dropping GR-RL, a new VLA+RL system that manages long-horizon precise
34
137
904
Thanks @_akhaliq again for sharing our work! For the full demo video, please find it in our thread:
0
0
1
GR-RL proves something: IL is inherently limited, and we can do things previously thought impossible by purely visuo-motor control simply making it RL. The future direction is clear: distill RL-enhanced behavior back into the foundation VLA, forming a self-improving,
1
2
21
The Result: Final performance: 83.3% success over continuous shoelace threading. The surprising part? GR-RL learns to: ๐ฅ retry when the lace slips ๐ฅ reposition the lace when the initial pose is bad ๐ฅ "self-correct" mid-task instead of freezing This is the behavior you
1
1
27
Online Stage โ Real-World Steering RL Now the robot learns ON THE PHYSICAL PLATFORM. But direct exploration in joint space causes dangerous jitter and is inefficient โ you need millimeter accuracy. So GR-RL explores in the latent noise space: A tiny 51.5M-param noise-predictor
1
1
34
Morphological Symmetry Augmentation Our bi-manual robot is leftโright symmetric. So we mirror EVERYTHING: ๐ RGB ๐ Proprioception ๐ Actions ๐ Language Instructions Data size doubles. Spatial reasoning robustness skyrockets. โ 72.7% success.
1
0
20
Offline Stage โ Filter the human flaws We train a Critic Transformer via distributional RL: โญ๏ธ Detects โvalue dropsโ when the operator hesitates or messes up โญ๏ธ Slices every trajectory into high-value vs low-value segments โญ๏ธ Retains only the cleanest expert behavior Effect:
2
1
23
The Idea: If imitation is broken, then: Let the robot learn from its own experience. GR-RL = โญ๏ธ Offline RL (data filtering) โญ๏ธ Symmetry augmentation โญ๏ธ Online closed-loop Real-World Reinforcement Learning All on top of a single VLA foundation model. Just RGB, proprioception,
1
2
31
Two killers of imitation learning (IL): (1) Human demos are NOT optimal Humans hesitate, retry, fix mistakes mid-trajectory. IL blindly copies ALL of it โ including the bad parts. (2) Training vs Deployment Misalignment VLA models output actions. To prevent jitter, robots
1
0
28
Why โshoelace threadingโ matters ๐ค This task is probably one of the most challenging household robotics tasks in terms of precision: ๐ฅ Soft-body chaos โ laces deform every frame ๐ฅ Millimeter precision โ 1โ2 mm slip = total failure ๐ฅ Long-horizon manipulation โ hundreds of
1
0
41
Well done @stepjamUK and @Neuracore_AI ! Playing with robot learning is always frustrating to start with, especially the infra. It creates an invisible barrier for people even to enter this field. In the era where everyone is chasing to build their next generalist robot,
Today we are excited to open up Neuracore to the academic community! Neuracore is a new data foundation built to accelerate robot learning by removing one of the fieldโs biggest bottlenecks: capturing and working with high-fidelity multimodal robotics data. For the first time,
0
0
5
Congrats! @stepjamUK It has been a great time working with Stephen. Looking forward to the exciting research coming up! Do not hesitate to apply if you're currently seeking for PhD opportunities!
As a newly appointed ๐๐๐๐ถ๐๐๐ฎ๐ป๐ ๐ฃ๐ฟ๐ผ๐ณ๐ฒ๐๐๐ผ๐ฟ at @imperialcollege, I'm thrilled to announce the ๐ฆ๐ฎ๐ณ๐ฒ ๐ช๐ต๐ผ๐น๐ฒ-๐ฏ๐ผ๐ฑ๐ ๐๐ป๐๐ฒ๐น๐น๐ถ๐ด๐ฒ๐ป๐ ๐ฅ๐ผ๐ฏ๐ผ๐๐ถ๐ฐ๐ ๐๐ฎ๐ฏ (๐ฆ๐ช๐๐ฅ๐) at ๐๐บ๐ฝ๐ฒ๐ฟ๐ถ๐ฎ๐น ๐๐ผ๐น๐น๐ฒ๐ด๐ฒ ๐๐ผ๐ป๐ฑ๐ผ๐ป. ๐ฆ๐ฎ๐ณ๐ฒ ๐ช๐ต๐ผ๐น๐ฒ-๐ฏ๐ผ๐ฑ๐
2
1
5
ByteWrist shows how compact parallel wrists can bring robotic manipulation closer to human-level dexterity in tight spaces. ๐ Read the paper: https://t.co/QZBiytM7Xp ๐ Project page:
bytewrist.github.io
Simple project page template for your research paper, built with Astro and Tailwind CSS
0
0
0
Results: 1. Higher integration & flexibility vs. Kinova systems 2. Stable rollโpitchโyaw (RPY) control 3. 116 hours of autonomous data collection for dexterous manipulation tasks
1
0
0
We built a 22-DoF dual-arm robot, ByteMini, powered by ByteWrist. It can: โ
Maneuver in narrow glove-box spaces โ
Grasp objects faster than Kinova wrists (โ2ร speedup) โ
Perform dual-arm deformable object manipulation (e.g. clothes hanging!)
1
0
0
โจ Key innovations: 1. Nested 3-stage parallel drive โ compact + multi-DOF control. 2. Arc-shaped end linkages โ optimized force transmission + wider range. 3. Central supporting ball joint โ stiffness without sacrificing flexibility.
1
0
0
๐ค Traditional serial wrists = bulky + error-prone in clutter. โ๏ธ Existing parallel wrists = stiff but not compact enough. Neither works well in tight, human-like environments. ByteWrist solves this.
1
0
0
๐ New paper alert! We introduce ByteWrist โ a compact, anthropomorphic robotic wrist that enables dexterous manipulation in confined spaces. Think home service, medical robots, or precision assembly. ๐ https://t.co/WxBCeAMmpM
#ByteDanceSeed #EmbodiedAI #Robotics
7
54
315
Lastly, we are still hiring! Our team is mainly based in Beijing and Singapore. DM me if you are interested!
0
0
0
๐ค Flow-Based Policy for Online Reinforcement Learning Problem: Standard RL policies often struggle to model complex, multi-modal action spaces. Our solution: We introduce FlowRL, a new framework that uses flow-based generative models to create highly expressive policies. By
1
0
0