Ai2
@allen_ai
Followers
78K
Following
3K
Media
646
Statuses
3K
Breakthrough AI to solve the world's biggest problems. › Join us: https://t.co/MjUpZpKPXJ › Newsletter: https://t.co/k9gGznstwj
Seattle, WA
Joined September 2015
Last year Molmo set SOTA on image benchmarks + pioneered image pointing. Millions of downloads later, Molmo 2 brings Molmo’s grounded multimodal capabilities to video 🎥—and leads many open models on challenging industry video benchmarks. 🧵
6
61
313
🚨New Model Update @Allen_AI’s Olmo-3.1-32B-Think is now available in the Text Arena! This open model is designed to perform strongly on reasoning, instruction following, and research-focused tasks. Bring your toughest prompts and see how it compares as community votes roll
Olmo 3.1 is here. We extended our strongest RL run and scaled our instruct recipe to 32B—releasing Olmo 3.1 Think 32B & Olmo 3.1 Instruct 32B, our most capable models yet. 🧵
8
9
85
Multi-turn report generation is live starting today. Try it at https://t.co/pCUcqGnLgb 💬 We're eager to hear how you use it + what would make it more useful → join our Discord & give feedback https://t.co/GnxLPhM3MW
0
0
2
📱 We've also improved Asta on mobile. Evidence now appears in cards instead of pop-ups that crowd the screen, navigation is smoother, & reports stream in without refreshing the page every time a new section appears.
1
1
3
📚 Reports draw from 108M+ abstracts and 12M+ full-text papers. Every sentence is cited, with citation cards that let you inspect sources, open the full paper, or view highlighted text where licensing allows. If data isn't in our library, Asta labels it as model-generated.
1
0
2
With multi-turn conversations, you can turn complex prompts into iterative investigations—adjusting scope, focus, or angle as you go. Ask follow-ups without losing context or citations, @-mention specific papers, & regenerate reports while keeping earlier drafts.
1
0
1
🆕 New in Asta: multi-turn report generation. You can now have back-and-forth conversations with Asta, our agentic platform for scientific research, to refine long-form, fully cited reports instead of relying on single-shot prompts.
1
8
68
@OpenRouterAI @huggingface 🔗 Olmo 3.1 32B Think API: https://t.co/G4yiIwyF78 🔗 Olmo 3.1 32B Instruct API: https://t.co/58fp0yTfBX Thanks to our partners @parasail_io, Public AI, & @Cirrascale 🤝
huggingface.co
0
0
5
Now you can use our most powerful models via API. Olmo 3.1 32B Think, our reasoning model for complex problems, is on @OpenRouterAI—free through 12/22. And Olmo 3.1 32B Instruct, our flagship chat model with tool use, is available through @huggingface Inference Providers. 👇
5
10
117
We’re excited to see what the community builds with any-horizon video agents like SAGE. 🚀 🌐 Project page: https://t.co/ww72CPVNvQ 💻 Code: https://t.co/KAzalzqipi ⬇️ Models & data: https://t.co/7qGZmT4n8p 📝 Paper: https://t.co/FGmib07MJ2
0
1
8
SAGE hits ~68% accuracy on SAGE-Bench in roughly 8–9 seconds per video. Other agent systems often take tens of seconds to minutes to answer a video-related question, yet still trail SAGE in accuracy.
1
0
8
On SAGE-Bench with Qwen3-VL-8B, SAGE agents stay close to the direct baseline on short clips while pulling ahead on longer videos. Long videos result in more reasoning turns, but reinforcement learning cuts this versus a supervised-fine-tuning-only approach.
1
0
5
We curate SAGE-Bench, a manually verified 1.7K-question benchmark of entertainment videos with an average duration of >700 seconds, focused on open-ended and practical questions—unlike existing MCQ and diagnostic benchmarks.
1
0
6
Under the hood, SAGE-MM is the orchestrator, deciding when to call tools (e.g., web search) vs. give an answer. It's trained on ~6.6K YouTube videos (~99K Q&A pairs, 400K+ state-action examples) using a multi-reward RL recipe for any-horizon open-ended reasoning.
1
0
7
Most video reasoning models answer a question about a video in a single turn after ingesting many frames. SAGE instead examines short scenes & then jumps to later or earlier parts, & can also search transcribed audio or the web to obtain additional info about the target video.
1
0
10
🎥 Introducing SAGE, an agentic system for long video reasoning on entertainment videos—sports, vlogs, & more. It learns when to skim, zoom in, & answer questions directly. On our SAGE-Bench eval, SAGE with a Molmo 2 (8B)-based orchestrator lifts accuracy from 61.8% → 66.1%. 🧵
7
26
219
Molmo 2 brings true openness to video + multi-image understanding! For multi-image, we’re releasing Molmo2-SynMultiImageQA: 1M+ synthetic text-rich images (charts, docs, etc.). Huge shoutout to my Ai2 teammates, let’s keep pushing open science! Data:
huggingface.co
Molmo 2 doesn't just answer questions about clips—it searches & points. The model returns coordinates & timestamps over videos + images, powering QA, counting, dense captioning, artifact detection, & subtitle-aware analysis. You can see exactly how it reasoned.
0
5
22
Adding tracking capability to Molmo2 was a fun experience! Molmo2 can track objects and assign IDs in text: “<tracks coords= t1 id1 x1 y1 id2 x2 y2…>” Demo: https://t.co/NWs16uViAH Rundown: https://t.co/Ko23RYFx81 Tips for best tracking 🧵👇 (Note: cup video is 2x speed)
3
7
24
🎗️Reminder, our Molmo 2 and Olmo 3 Reddit AMA begins soon at 1pm PST / 4pm EST.
reddit.com
Explore this post and more from the LocalLLaMA community
0
1
9
Check out what Molmo can do now.
Molmo 2 doesn't just answer questions about clips—it searches & points. The model returns coordinates & timestamps over videos + images, powering QA, counting, dense captioning, artifact detection, & subtitle-aware analysis. You can see exactly how it reasoned.
0
8
28