Ranjay Krishna
@RanjayKrishna
Followers
6K
Following
4K
Media
190
Statuses
2K
Assistant Professor @ University of Washington, Co-Director of RAIVN lab (https://t.co/f0BWKyjoeA), Director of PRIOR team (https://t.co/l9RzTesMSM)
California, USA
Joined August 2011
🎉 Excited to share that our paper “Convergent Functions, Divergent Forms” will be presented at NeurIPS 2025🤖 in San Diego! We present LOKI, a compute-efficient framework for co-evolving robot morphologies🦾 and control policies⚙️. LOKI discovers diverse, high-performing robot
1
6
11
🚨Sensational title alert: we may have cracked the code to true multimodal reasoning. Meet ThinkMorph — thinking in modalities, not just with them. And what we found was... unexpected. 👀 Emergent intelligence, strong gains, and …🫣 🧵 https://t.co/2GPHnsPq7R (1/16)
25
60
267
MolmoAct is an incredibly thought-provoking paper; VLAs are currently poor at reasoning, especially at the kinds of spatial reasoning robots need to do in order to avoid collisions and perform tasks successfully.
Reasoning models have massively expanded what LLMs are capable of, but this hasn’t necessarily applied to robotics. Perhaps this is in part because robots need to reason over space, not just words and symbols; so the robotics version of a reasoning model would need to think in
3
31
194
@RanjayKrishna's stellar talk on video. @ICCVConference
🌺 Join us in Hawaii at ICCV 2025 for the workshop “What is Next in Multimodal Foundation Models?” 🗓️ Monday, October 20 | 8:00 – 12:00📍Room 326 B We’ve got a stellar lineup of speakers & panelists— details here: 🔗 https://t.co/t2DmcZAlWM
@ICCVConference
0
1
8
Don’t miss out on working with a new exciting professor!
Excited to be at #ICCV2025 in Hawaii!🌴 I'll present two papers: M3DocVQA/M3DocRAG (Mon) and CAPTURe (Tue). Check our poster sessions👇 and feel free to ping me to grab a coffee together I'm hiring PhD students to work on multimodal AI and robotics with me at JHU from Fall 2026!
0
1
20
World Model Reasoning for VLM Agents (NeurIPS 2025, Score 5544) We release VAGEN to teach VLMs to build internal world models via visual state reasoning: - StateEstimation: what is the current state? - TransitionModeling: what is next? MDP → POMDP shift to handle the partial
3
68
298
Happy to share that my work at Salesforce Research, now named ☕LATTE – Learning to Think with Vision Specialists – has been accepted by EMNLP25 and selected for an Oral presentation! 😎 We propose “learning to reason with vision specialists”: rather than distilling both
🌮 Introducing 🌮 TACO - our new family of multimodal action models that combine reasoning with real-world actions to solve complex visual tasks! 📊Results: 20% gains on MMVet 3.9% average improvement across 8 benchmarks 1M+ synthetic CoTA traces in training 🔓 🔓🔓Fully
3
16
95
Introduce ManiFlow 🤖, a visual imitation learning policy for general robot manipulation that is efficient, robust, and generalizable: - 98.3% improvement on 8 real-world tasks, generalizing to novel objects & backgrounds - Applied to diverse embodiments: single-arm, bimanual &
8
64
219
We have now open-source the checkpoints from all training stages along with the full training and fine-tuning code. Check it out here:
github.com
Official Repository for MolmoAct. Contribute to allenai/molmoact development by creating an account on GitHub.
Reasoning is central to purposeful action. Today we introduce MolmoAct — a fully open Action Reasoning Model (ARM) for robotics. Grounded in large-scale pre-training with action reasoning data, every predicted action is interpretable and user-steerable via visual trace. We are
10
39
254
A picture now is worth more than a thousand words in genAI; it can be turned into a full 3D world! And you can stroll in this garden endlessly long, it will still be there.
147
341
3K
AI2's MolmoAct model ‘thinks in 3D’ to challenge Nvidia and Google in robotics AI https://t.co/DZXw7YHDML
1
2
11
Most AI models still think in words. People, without even noticing, think with our bodies, planning how to move, grasp, and use things around us. MolmoAct brings that to robotics: reasoning in space before acting. This is how we will get to the GPT-moment for robotics.
🤖✨ What if models that take action in the physical world could think through your instructions? Meet MolmoAct, our new fully open Action Reasoning Model (ARM) that does just that. 🧵
1
14
72
This to me really feels like how robot foundation models "should" work. i like that it can autoregressively predict depth tokens, lift to 2.5d, and use this for reasoning - it feels like a true robotics analogue of modern reasoning LLMs. Really exciting work.
Reasoning is central to purposeful action. Today we introduce MolmoAct — a fully open Action Reasoning Model (ARM) for robotics. Grounded in large-scale pre-training with action reasoning data, every predicted action is interpretable and user-steerable via visual trace. We are
4
19
187
✨Thrilled to see our perception tokens used in robotics: MolmoAct predicts depth tokens first, then plans trajectories and actions. Love this direction for grounded action reasoning. check out the perception tokens here:
🤖✨ What if models that take action in the physical world could think through your instructions? Meet MolmoAct, our new fully open Action Reasoning Model (ARM) that does just that. 🧵
0
2
20
We are launching MolmoAct🤖✨ A fully open Action Reasoning Model (ARM) that can reason in space: it perceives → it plans → it acts. 🧵👇
🤖✨ What if models that take action in the physical world could think through your instructions? Meet MolmoAct, our new fully open Action Reasoning Model (ARM) that does just that. 🧵
3
10
45
MolmoAct: Action Reasoning Models that can Reason in Space depth → trajectory → actions - Backbone: Molmo VLM (OpenCLIP/OLMo2-7B or SigLIP2/Qwen2.5-7B) + ordinal action tokens (256 bins, 5.4× less pretrain compute) - Data: 10.6k Franka trajectories (93 tasks) + OXE subset
2
5
51
MolmoAct: Action Reasoning Models that can Reason in Space "Reasoning is central to purposeful action, yet most robotic foundation models map perception and instructions directly to control, which limits adaptability, generalization, and semantic grounding. We introduce
3
19
164
Reasoning is central to purposeful action. Today we introduce MolmoAct — a fully open Action Reasoning Model (ARM) for robotics. Grounded in large-scale pre-training with action reasoning data, every predicted action is interpretable and user-steerable via visual trace. We are
🤖✨ What if models that take action in the physical world could think through your instructions? Meet MolmoAct, our new fully open Action Reasoning Model (ARM) that does just that. 🧵
15
77
466