Robbyant
@robbyant_brain
Followers
746
Following
38
Media
15
Statuses
26
@AntGroup Affiliate | Building Practical Embodied AI 🌟 Intelligence in Action, Benefits for Everyone
Joined January 2026
We're open-sourcing LingBot-VA to accelerate the development of embodied intelligence for the entire community. Let's build the future of robotics together. Check out the code, models, and our tech report: 🐙 Code: https://t.co/whqNyDZEN5 🤗 Hugging Face:
huggingface.co
We’re on a journey to advance and democratize artificial intelligence through open source and open science.
0
0
14
🦾 From seeing to doing. We're closing the loop between video prediction and real-world action. On the final day of Robbyant Open Source Week, we bring you LingBot-VA—the world's first causal video-action world model for generalist robot control. 🔥 Key Highlights: 🤖 Predicts &
78
57
265
🔥 Very excited to share that we’re releasing LingBot-World 🌍 @robbyant_brain — an open-source frontier world model! We’re pushing the limits of: 🔹 High-Fidelity Simulation & Precise Control 🔹 Long-Horizon Consistency & Memory 🔹 Modeling Physical & Game Worlds The most
9
45
307
Thanks to AK for the rec! 👏 Dig into LingBot-VLA’s tech details → our technical report is up! 📄 #LingBotVLA #TechReport #VLA
A Pragmatic VLA Foundation Model https://t.co/iNv6rsn5KL
0
0
1
From perception (LingBot-Depth) to action (LingBot-VLA) to imagination (LingBot-World), we are building the foundational stack for embodied intelligence. Day 3 of our open-source week. Dive in: 🌐 Website: https://t.co/YP48j9M0S1 📑 Tech Report: https://t.co/t6ueL9dDH9 🐙 Code:
github.com
Advancing Open-source World Models. Contribute to Robbyant/lingbot-world development by creating an account on GitHub.
0
3
36
Zero-shot generalization: feed LingBot-World a single real-world photo or game screenshot, and it generates a fully interactive world—no scene-specific training needed. This is powered by our hybrid data strategy: large-scale web videos + game captures with clean, UI-free frames
2
2
23
A true training ground must respond in real-time. LingBot-World achieves ~16 FPS throughput with under 1-second end-to-end latency. Control characters, adjust camera angles, or trigger environment changes via text commands—all with instant visual feedback. No pre-rendering, just
1
1
24
Long-term consistency is a known challenge for video generation—objects distort, scenes collapse. LingBot-World solves this with multi-stage training and parallel acceleration, enabling nearly 10 minutes of stable, continuous generation. Even after the camera looks away for 60
1
1
31
🌍 Reality is expensive. Simulation is the shortcut. But what if the simulation could think, respond, and remember? Today, we open-source LingBot-World, an interactive world model built on @Alibaba_Wan Wan2.2! 🔥 We’re pushing the limits of: 🔷 High-Fidelity Simulation &
53
80
446
🚀 Meet LingBot-VLA: A pragmatic Vision-Language-Action model designed to bridge the gap between perception and execution in robotics. 🤖 ✅LingBot-VLA-4B: Lightweight & versatile. https://t.co/eXdtpMhNfo ✅LingBot-VLA-4B-Depth: Enhanced for high-precision spatial tasks.
1
5
33
Thanks for the shout-out, @AdinaYakup! We're excited to share our LingBot-VLA and LingBot-Depth models with the community. Check out the technical reports and explore the models on Hugging Face. Stay tuned—something even more powerful is coming soon!
Ant Group is going big on robotics 🤖@robbyant_brain They just dropped their first VLA and depth perception foundation model on @huggingface ✨ LingBot-VLA : - Trained on 20k hours of real-world robot data - 9 robot embodiments - Clear no-saturation scaling laws - Apache 2.0
0
0
4
From LingBot-Depth to LingBot-VLA, we are building the foundational blocks for embodied intelligence. This is day 2 of our open-source week—stay tuned for more. Dive in and build with us: 🌐Website: https://t.co/SgNKPmWbKZ 📊 Datasets: https://t.co/mX8A919eS7 🐙 Code:
huggingface.co
We’re on a journey to advance and democratize artificial intelligence through open source and open science.
0
2
15
Lowering the barrier to entry is critical. We're also open-sourcing our entire post-training toolchain. It's 1.5-2.8x more efficient than mainstream frameworks like StarVLA, enabling developers to fine-tune on their own tasks with significantly less data and compute.
1
0
12
We also integrated the spatial awareness of LingBot-Depth, which we released yesterday. By distilling depth information into our VLA via learnable queries, the average success rate on GM-100 climbs further to 17.3%. Combining action models with high-fidelity perception is a key
1
0
14
Performance across real and simulated environments: GM-100 real-robot benchmark (100 tasks, 3 robot embodiments): LingBot-VLA hits a 15.7% cross-embodiment success rate → outperforming Pi0.5 (13.0%). RoboTwin 2.0 (heavily randomized simulation): We hold a 9.92% lead in
1
0
14
🧠 What if one AI brain powers all robots? Retraining for every new embodiment is the biggest scaling pain in embodied AI—we’re fixing it. Today, we open-source LingBot-VLA: a Vision-Language-Action model built on @Alibaba_Qwen Qwen-2.5-VL and pre-trained on 20,000 hours of
60
60
248
Very excited to share our first public release after I joined @robbyant_brain! We present Lingbot-Depth 👀 — a state-of-the-art depth foundation model trained with RGB-D MAE on millions of real & simulated RGBD pairs. 🔹 Camera depths as natural masks for RGB-D MAE modeling 🔹
9
49
376
@robbyant_brain has open-sourced LingBot-Depth, a spatial intelligence model that lets robots "see" the unseeable. 🚀By aligning RGB & Depth latent spaces, it achieves reliable grasping of transparent/reflective objects where traditional sensors fail. Key Breakthroughs: ✅SOTA
0
3
26
We believe in building in the open. Today, we're releasing it all: 🌐Website: https://t.co/HX1KOigCqq 🐙Code: https://t.co/0cVHhGyoqy 📑Tech Report: https://t.co/O88ficd6oS 🤗HuggingFace: https://t.co/Dk0nlLgFCD 🤖ModelScope: https://t.co/tRvawRXVtc And we're not stopping. Stay
huggingface.co
We’re on a journey to advance and democratize artificial intelligence through open source and open science.
2
1
22
This isn't just about grasping. LingBot-Depth provides a robust foundation for a wide range of spatial intelligence tasks. This includes more accurate 3D indoor mapping, improved camera pose and trajectory estimation, and reliable 4D point tracking of dynamic objects—all within a
2
0
22