Ai2 @allen_ai X Profile

Ai2

@allen_ai

Followers

77K

Following

3K

Media

627

Statuses

3K

Breakthrough AI to solve the world's biggest problems. › Join us: https://t.co/MjUpZpKPXJ › Newsletter: https://t.co/k9gGznstwj

https://t.co/jDMIVASfNj

Seattle, WA

Joined September 2015

Don't wanna be here? Send us removal request.

Ai2

@allen_ai

3 days

Olmo 3.1 is here. We extended our strongest RL run and scaled our instruct recipe to 32B—releasing Olmo 3.1 Think 32B & Olmo 3.1 Instruct 32B, our most capable models yet. 🧵

23

106

711

Ai2

@allen_ai

24 minutes

We're excited about what's next, including scaling byteifying to larger models. 📝 Blog: https://t.co/pQS0CJpQ9d ⬇️ Download Bolmo 7B: https://t.co/KMaN1gND4C | 1B: https://t.co/LUObc0BRCw 📄 Report:

huggingface.co

0

6

Ai2

@allen_ai

24 minutes

On our eval suite & character-focused benchmarks like CUTE & EXECUTE, Bolmo matches/surpasses subword models while excelling at character-level reasoning. Once you byteify a base model, you can import capabilities from post-trained checkpoints via weight arithmetic.

1

0

8

Ai2

@allen_ai

24 minutes

We keep Olmo 3's original backbone & capabilities, adding a lightweight byte stack so Bolmo can reason over bytes without discarding prior work. The result: a byte-level model with Olmo 3's strengths + finer-grained text understanding. 🚀

1

0

6

Ai2

@allen_ai

24 minutes

Bolmo takes an existing Olmo 3 7B checkpoint and retrofits it into a fast, flexible byte-level architecture. It skips hand-engineered vocabularies and operates directly on UTF-8 bytes, handling spelling, edge cases, & multilingual scripts naturally.

1

7

Ai2

@allen_ai

24 minutes

Most LMs still speak in subword tokens (e.g., ▁inter + national + ization). They work, but struggle with character-level edits, whitespace, rare words, & multilingual support—and every token gets the same compute, regardless of complexity.

1

0

5

Ai2

@allen_ai

24 minutes

Introducing Bolmo, a new family of byte-level language models built by "byteifying" our open Olmo 3—and to our knowledge, the first fully open byte-level LM to match or surpass SOTA subword models across a wide range of tasks. 🧵

1

12

71

Ai2

@allen_ai

3 days

@AllenInstitute NeuroDiscoveryBench was created on openly available Allen Institute datasets—resources that have become foundational for the field. We're inviting researchers to help advance AI-assisted neuroscience discovery. 🔬 📂 Dataset: https://t.co/EHgjB3dYdi 📝 Read more:

github.com

Contribute to allenai/neurodiscoverybench development by creating an account on GitHub.

1

5

Ai2

@allen_ai

3 days

@AllenInstitute We also found that raw, unprocessed datasets were much harder for AI agents, which struggled with the data transformations + complex joins required before analysis could even begin. Data wrangling remains a major challenge for AI in biology.

1

0

3

Ai2

@allen_ai

3 days

@AllenInstitute The answers to questions in NeuroDiscoveryBench can't be retrieved from memory or web search. AI systems have to actually analyze the data. Our baseline tests confirm this—models without data access score poorly, while data analysis agents perform substantially better. 📈

1

0

4

Ai2

@allen_ai

3 days

@AllenInstitute NeuroDiscoveryBench includes ~70 question-answer pairs drawn from major Allen Institute publications. These aren't simple factoid questions—they require deep data analysis to answer.

1

0

5

Ai2

@allen_ai

3 days

🧠 Introducing NeuroDiscoveryBench. Built with @AllenInstitute, it’s the first benchmark for evaluating AI systems like our Asta DataVoyager agent on neuroscience data. The benchmark tests whether AI can truly extract insights from complex brain datasets.

4

22

106

Victoria Graf @NeurIPS ☀️

@VictoriaWGraf

3 days

Olmo 3 Instruct is now bigger and better 🚀 Olmo 3 Think? Better too Check out Olmo 3.1! ✨

Ai2

@allen_ai

3 days

Olmo 3.1 is here. We extended our strongest RL run and scaled our instruct recipe to 32B—releasing Olmo 3.1 Think 32B & Olmo 3.1 Instruct 32B, our most capable models yet. 🧵

1

3

18

Faeze Brahman

@faeze_brh

3 days

We’re dropping Olmo 3.1 as a little end-of-year surprise. Think of it as Olmo 3, but with holiday upgrades. 🎁🎄

Ai2

@allen_ai

3 days

Olmo 3.1 is here. We extended our strongest RL run and scaled our instruct recipe to 32B—releasing Olmo 3.1 Think 32B & Olmo 3.1 Instruct 32B, our most capable models yet. 🧵

3

5

37

Hamish Ivison

@hamishivi

3 days

If u chatted to me at neurips and I got distracted looking at my computer it was cuz i was babysitting this run! Here's full curves from our in-loop evaluations. Sit and wait and the model just gets better (no changes from the initial recipe we announced, just run for longer!)

Ai2

@allen_ai

3 days

Olmo 3.1 is here. We extended our strongest RL run and scaled our instruct recipe to 32B—releasing Olmo 3.1 Think 32B & Olmo 3.1 Instruct 32B, our most capable models yet. 🧵

3

24

106

Ai2

@allen_ai

3 days

Olmo 3.1 offers the full model flow: weights, data, training recipes, & more. 💻 Download: https://t.co/3aKebBhFlD ➡️ Try: https://t.co/PL325bk3wn 📚 Blog: https://t.co/a8i7eTlwxU ✏️ Report:

allenai.org

Our new flagship Olmo 3 model family empowers the open source community with not only state-of-the-art open models, but the entire model flow and full traceability back to training data.

0

3

33

Ai2

@allen_ai

3 days

Alongside 3.1 Think & Instruct, we’re also upgrading our RL-Zero 7B models for math & code with Olmo 3.1 RL Zero 7B Code & Olmo 3.1 RL Zero 7B Math. Both benefit from longer & more stable training runs—delivering stronger results + better baselines for RL researchers.

1

25

Ai2

@allen_ai

3 days

🛠️ Olmo 3.1 Instruct 32B is our best fully open 32B instruction-tuned model. It’s optimized for chat, tool use, & multi-turn dialogue—making it a much more performant sibling of Olmo 3 Instruct 7B and ready for real-world applications.

1

0

27

Ai2

@allen_ai

3 days

🧠 After the initial Olmo 3 Think 32B release, we extended RL training for 21 days with extra epochs on our Dolci-Think-RL dataset. Olmo 3.1 Think 32B gains +5 AIME, +4 ZebraLogic, & +20 IFBench vs Olmo 3 Think 32B—making it the strongest fully open reasoning model.

1

0

39

Ai2

@allen_ai

7 days

Now anyone can use DataVoyager as a transparent AI partner for data-driven discovery. Try it at https://t.co/qjwKAgDm3D → select “Analyze data,” upload a dataset, & start asking questions. Learn more in our updated blog: https://t.co/qmLMNS2sTe

0

10