Uri Alon
@urialon1
Followers
3K
Following
29K
Media
34
Statuses
358
Research Scientist @GoogleDeepMind
Pittsburgh, PA
Joined April 2013
"Gemini 3 flash achieved 90% acc @ 1 million ctx. ... what black magic did they do?" Yes, we've developed powerful new techniques for Gemini 3 long-context reasoning. Congratulations @urialon1 and the team!
for more context on OpenAI's MRCR benchmark, curated at https://t.co/k09cLyqfgr by @DillonUzar , Gemini 3 flash achieved 90% acc @ 1 million ctx this performance is SoTA across all models, most SoTA models cant even go past 256k ctx at this length, you cant be using standard
24
82
1K
for more context on OpenAI's MRCR benchmark, curated at https://t.co/k09cLyqfgr by @DillonUzar , Gemini 3 flash achieved 90% acc @ 1 million ctx this performance is SoTA across all models, most SoTA models cant even go past 256k ctx at this length, you cant be using standard
16
29
316
Context Arena Update: Added @GoogleDeepMind's Gemini 3 Flash Preview [12-17] to the OAI-MRCR leaderboards (2-, 4-, 8-needle)! This sets a new bar for the efficiency tier. With reasoning set to High, Gemini 3 Flash is effectively matching and at ultra long context even beating
3
6
37
Gemini 3 Flash is live. ⚡️ We’ve packed Gemini 3’s Pro-grade reasoning into a leaner model with Flash-level latency, efficiency, and cost. It's my favorite model to use – the latency feels like a real conversation, with the deep intelligence intact. Available in the API, Gemini
blog.google
Gemini 3 Flash offers frontier intelligence built for speed at a fraction of the cost.
21
69
911
We’ve pushed out the Pareto frontier of efficiency vs. intelligence again. With Gemini 3 Flash ⚡️, we are seeing reasoning capabilities previously reserved for our largest models, now running at Flash-level latency. This opens up entirely new categories of near real-time
51
195
2K
Today we entered the Gemini 3 era, our next step on the path toward AGI. ⚡ Gemini 3 is our most intelligent model that combines capabilities like multimodality, long context and reasoning, so you can bring any idea to life. Explore more of what you can do and build with Gemini
67
139
1K
This is Gemini 3: our most intelligent model that helps you learn, build and plan anything. It comes with state-of-the-art reasoning capabilities, world-leading multimodal understanding, and enables new agentic coding experiences. 🧵
218
1K
7K
Gemini 2.5 Pro + 2.5 Flash are now stable and generally available. Plus, get a preview of Gemini 2.5 Flash-Lite, our fastest + most cost-efficient 2.5 model yet. 🔦 Exciting steps as we expand our 2.5 series of hybrid reasoning models that deliver amazing performance at the
258
454
4K
I wrote a post on how to connect with people (i.e., make friends) at CS conferences. These events can be intimidating so here's some suggestions on how to navigate them I'm late for #ICLR2025 #NAACL2025, but just in time for #AISTATS2025 and timely for #ICML2025 acceptances! 1/4
5
90
672
Super honored to win the Language Modeling SAC award! I'll be presenting this work Wednesday in the 2pm poster session in Hall 3-- would love to chat with folks there or at the rest of the conference about long context data, ICL, inference time methods, New Mexican food, etc :)
In-context learning provides an LLM with a few examples to improve accuracy. But with long-context LLMs, we can now use *thousands* of examples in-context. We find that this long-context ICL paradigm is surprisingly effective– and differs in behavior from short-context ICL! 🧵
9
17
106
We've been waiting for o3 and... it's worse than Gemini 2.5 Pro on deep understanding and reasoning tasks while being 4.5x more expensive Google is again in the lead for AGI (maybe the first time since the Transformer release) Let me tell you, Google has been cooking
35
70
834
✨ Ever tried generating an image from a prompt but ended up with unexpected outputs? Check out our new paper #FollowTheFlow - tackling T2I issues like bias, failed binding, and leakage from the textual encoding side! 💼🔍 https://t.co/jTNgec28hw
https://t.co/orB0Y7iW1S 🧵[1/7]
2
18
61
BREAKING: Gemini 2.5 Pro is now #1 on the Arena leaderboard - the largest score jump ever (+40 pts vs Grok-3/GPT-4.5)! 🏆 Tested under codename "nebula"🌌, Gemini 2.5 Pro ranked #1🥇 across ALL categories and UNIQUELY #1 in Math, Creative Writing, Instruction Following, Longer
Think you know Gemini? 🤔 Think again. Meet Gemini 2.5: our most intelligent model 💡 The first release is Pro Experimental, which is state-of-the-art across many benchmarks - meaning it can handle complex problems and give more accurate responses. Try it now →
75
404
2K
I created a Python project starter repo for students that helps maintain good code quality while doing research projects: https://t.co/HRFdxAucsI I was opinionated and made only one choice for each tool, but there are other options too!
17
94
680
Ok let's be honest you really cooked with this one Google 😗
154
540
9K
*New ICLR paper* – We introduce a paradigm of *looped models for reasoning*. Main claims - Reasoning requires depth (via looping), not necessarily params - LLM reasoning predictably scales with more loops - Looped models generate “latent thoughts” & can simulate CoT reasoning 1/n
12
84
561
We’ve been *thinking* about how to improve model reasoning and explainability Introducing Gemini 2.0 Flash Thinking, an experimental model trained to think out loud, leading to stronger reasoning performance. Excited to get this first model into the hands of developers to try
82
301
4K
Introducing Gemini 2.0 Flash Thinking, an experimental model that explicitly shows its thoughts. Built on 2.0 Flash’s speed and performance, this model is trained to use thoughts to strengthen its reasoning. And we see promising results when we increase inference time
125
476
4K
Today, we’re announcing Veo 2: our state-of-the-art video generation model which produces realistic, high-quality clips from text or image prompts. 🎥 We’re also releasing an improved version of our text-to-image model, Imagen 3 - available to use in ImageFX through
268
1K
7K