Luke Zettlemoyer
@LukeZettlemoyer
Followers
10K
Following
6K
Media
1
Statuses
2K
Joined September 2015
Today, @NVIDIA is launching the open Nemotron 3 model family, starting with Nano (30B-3A), which pushes the frontier of accuracy and inference efficiency with a novel hybrid SSM Mixture of Experts architecture. Super and Ultra are coming in the next few months.
27
153
788
We are releasing Bolmo today! Bolmo is the best byte-level model so far. It comes close to and sometimes surpasses Olmo 3. Bolmo also performs competitively in terms of speed & is fully open. I was skeptical of byte-level models for a long time but I finally switched camps🧵
Introducing Bolmo, a new family of byte-level language models built by "byteifying" our open Olmo 3—and to our knowledge, the first fully open byte-level LM to match or surpass SOTA subword models across a wide range of tasks. 🧵
6
15
68
🚀 Olmo 3.1 is here — earlier than expected! 32B Think: 3 extra weeks of RL training = steady gains and significant improvements. 32B Instruct: Our 7B recipe scaled to 32B, tuned for short chat + function calling. Olmo 3 keeps leveling up! Details in the latest version of the
Olmo 3.1 is here. We extended our strongest RL run and scaled our instruct recipe to 32B—releasing Olmo 3.1 Think 32B & Olmo 3.1 Instruct 32B, our most capable models yet. 🧵
13
43
426
It's finally here: The public (and most complete) version of my talk covering every stage of the process to build Olmo 3 Think. This involves changes and new considerations of every angle of the stack, from pretraining, evaluation, and of course post-training.
12
81
715
My new blog post discusses the physical reality of computation and why this means we will not see AGI or any meaningful superintelligence:
timdettmers.com
If you are reading this, you probably have strong opinions about AGI, superintelligence, and the future of AI. Maybe you believe we are on the cusp of a transformative breakthrough. Maybe you are...
164
172
1K
📢 New Paper 📢 Self-Improving VLM Judges Without Human Annotations Reward models & judges are critical for evaluating output quality and alignment with human preferences for VLM training. Current training approaches typically rely on: 💸 Costly human preference annotations 🔒
2
20
72
Today we’re open-sourcing a preview of our two new models in the Isaac family: hybrid-reasoning 2B and 1B-parameter best-in-class vision-language models. Weights → https://t.co/1WgHMDfCST Blog → https://t.co/8MOLPKpUhO Demo → https://t.co/sAKt5dnZ6U
3
14
53
Can we simplify video generation by decomposing it into interleaved text-video co-generation? Would explicit, repeated thinking in language improve generation in pixels? We introduce TV2TV: a unified model that jointly learns - language modeling (next-token prediction) - video
4
37
86
😂 Research is often very serious… but it doesn’t always have to be! If you are interested in computational humor, or just need a fun break from your current work, join us for MWAHAHA ⬇️ Tasks include both text and multimodal humor generation, in multiple languages
pln-fing-udelar.github.io
We want you to compete in making the funniest computer program.
1
8
13
I will be at #NeurIPS2025 12.3–12.7 Looking forward to meeting old and new friends ! ☕️🌮 Recently working on hallucination (Binary RAR) and verbatim memorization (ParaPO), issues that scaling up pretraining cannot simply fix. Also interested in making models learn more like
1
5
36
Thrilled to share that @annadgoldie and I are launching @RicursiveAI, a frontier lab enabling recursive self-improvement through AIs that design their own chips. Our vision for transforming chip design began with AlphaChip, an AI for layout optimization used to design four
wsj.com
Founded by ex-Google researchers, the company raised $35 million with backing from Sequoia to automate chip design.
Introducing Ricursive Intelligence, a frontier AI lab enabling a recursive self-improvement loop between AI and the chips that fuel it. Learn more at https://t.co/cSpbrQwwEn
123
136
1K
I’m thrilled to announce that I’m launching a new startup dedicated to patient-centric AI for drug discovery, and we’re hiring Founding AI Engineers who are passionate about advancing healthcare through cutting-edge AI. Apply here by Jan 10:
2
34
359
🚀 Excited to share ToolOrchestra, an end-to-end RL training framework for orchestrating tools and agentic workflows. Everyone’s building agent workflows these days — connecting tools, APIs, and LLMs like LEGO. 🧩 But here are our findings: 👉 Just prompting the agent workflow
24
68
314
Attending @NeurIPSConf and interested in distributed, modular, and/or open AI? Hadn't seen someone put together a list of poster presentations in this area so took it upon myself to thread out who I'm excited to talk to next week🧵
5
5
48
Life update: I moved to silicon valley to tackle agents' biggest challenges: plasticity and reliability. Today's agents are smart but brittle. They lack plasticity (continual learning and adaptation) and reliability (stable, predictable behavior with bounded failures). These two
40
43
421
First time so happy seeing negative results on a paper😆 moo moo rawrrrrrr!!!! Negative signals from training on spurious rewards shows successful decontamination of our training data. Things we can do only due to fully open data + model + training + eval✨
Because Olmo 3 is fully open, we decontaminate our evals from our pretraining and midtraining data. @StellaLisy proves this with spurious rewards: RL trained on a random reward signal can't improve on the evals, unlike some previous setups
2
10
149
One key to making a 🔥 LM: ☢️🧼remove benchmark contamination 📊🤔then make the right development decisions by not overestimating performance!
Announcing Olmo 3, a leading fully open LM suite built for reasoning, chat, & tool use, and an open model flow—not just the final weights, but the entire training journey. Best fully open 32B reasoning model & best 32B base model. 🧵
1
4
33
Announcing Olmo 3, a leading fully open LM suite built for reasoning, chat, & tool use, and an open model flow—not just the final weights, but the entire training journey. Best fully open 32B reasoning model & best 32B base model. 🧵
55
332
2K
🔥Thrilled to introduce DR Tulu-8B, an open long-form Deep Research model that matches OpenAI DR 💪Yes, just 8B! 🚀 The secret? We present Reinforcement Learning with Evolving Rubrics (RLER) for long-form non-verifiable DR tasks! Our rubrics: - co-evolve with the policy model -
8
115
542
OpenAI's blog ( https://t.co/Mu05PFfPXg) points out that today’s language models hallucinate because training and evaluation reward guessing instead of admitting uncertainty. This raises a natural question: can we reduce hallucination without hurting utility?🤔 On-policy RL with
26
123
674