Fangyu Liu
@hardy_qr
Followers
1K
Following
952
Media
26
Statuses
233
Research Scientist @GoogleDeepMind working on Gemini♊ pretraining. PhD @CambridgeLTL. BMath @UWaterloo. From 成都🐼. Opinions my own.
Mountain View, California
Joined February 2016
52.8 > 69.1 = 30.8 TIL
1
0
14
BREAKING: Gemini 2.5 Pro is now #1 on the Arena leaderboard - the largest score jump ever (+40 pts vs Grok-3/GPT-4.5)! 🏆 Tested under codename "nebula"🌌, Gemini 2.5 Pro ranked #1🥇 across ALL categories and UNIQUELY #1 in Math, Creative Writing, Instruction Following, Longer
Think you know Gemini? 🤔 Think again. Meet Gemini 2.5: our most intelligent model 💡 The first release is Pro Experimental, which is state-of-the-art across many benchmarks - meaning it can handle complex problems and give more accurate responses. Try it now →
75
405
2K
Anyone who has been in this room knows that it’s never just another day in here! This space has seen the extremes of chaos and genius! ...and we ship! https://t.co/qcsBMdnlQA Happy Wednesday everyone!
10
29
207
Coding using @cursor_ai 0.45 with the @GoogleDeepMind (new) gemini-2.0-flash-thinking-exp model seems like the biggest step up in genai coding since Claude Sonnet 3.5 came out last June. This is unreal... forget about R1 folks - check out this new Gemini model! 🤯
52
127
2K
Happy to see people like our hyperfitting paper. We are presenting it at ICLR 2025 in Singapore later this year 🇸🇬
This is my favorite paper of 2025 so far. "Hyperfitting": When a language model overfits (train loss -> 0, eval loss increases over time), greedy (top-1) decoding leads to high-quality and diverse (non-copied) generated samples. This is so counterintuitive, it feels magical.
2
4
52
Felix was someone we all looked up to in the lab. I'm really sad.
I’m really sad that my dear friend @FelixHill84 is no longer with us. He had many friends and colleagues all over the world - to try to ensure we reach them, his family have asked to share this webpage for the celebration of his life: https://t.co/1QoyHmAD3p
0
0
20
Appreciate @aidan_mclau looking into the thinking model results. Originally scores looked weak as the response was plucked from the thought content versus output. We are looking into ways of making thinking output less confusing for people running evals. This is why we 🚢, to
two aidanbench updates: > gemini-2.0-flash-thinking is now #2 (explanation for score change below) > deepseek v3 is #22 (thoughts below)
5
9
103
A good thinker doesn't necessarily have to underperform in other tasks 😉
Breaking news from Chatbot Arena⚡🤔 @GoogleDeepMind's Gemini-2.0-Flash-Thinking debuts as #1 across ALL categories! The leap from Gemini-2.0-Flash: - Overall: #3 → #1 - Overall (Style Control): #4 → #1 - Math: #2 → #1 - Creative Writing: #2 → #1 - Hard Prompts: #1 → #1
0
1
21
What's your Final Answer?
We’ve been *thinking* about how to improve model reasoning and explainability Introducing Gemini 2.0 Flash Thinking, an experimental model trained to think out loud, leading to stronger reasoning performance. Excited to get this first model into the hands of developers to try
0
0
2
Introducing Gemini 2.0 Flash Thinking, an experimental model that explicitly shows its thoughts. Built on 2.0 Flash’s speed and performance, this model is trained to use thoughts to strengthen its reasoning. And we see promising results when we increase inference time
127
478
4K
A significant portion of what we read today is machine-generated. Fast forward a few years, it might be 95%+ machine-generated. It is a pretty fascinating experiment we are running. Are we as a species gonna mode-collapse, or self-improve?
2
0
4
A simple yet powerful example of the new Gemini 2.0 Flash's native multimodal input + output. Precise conversational editing & reasoning! Next step, Chess!
23
47
411
It's cool to see capabilities being compounding. Progress at one front eventually accelerates progress at other fronts: ultra long-context, MM-in/out, reasoning/planning, agency, ... And it's all just one model!
Thrilled to kick off the Gemini 2.0 era with Gemini 2.0 Flash, an update to our workhorse model that outperforms even 1.5 Pro at twice the speed. It has really great multilingual skills, and can natively call tools, like Google Search. It’s the first release in the Gemini 2.0
0
0
13
Super excited for native image out to be released. Had the opportunity to work with a brilliant team to take this from idea to product over the past year. First going to early access partners, then more widely in early 2025. We'll be sharing some cool demos throughout the day
As our workhorse model, Gemini 2.0 Flash outperforms 1.5 Pro on key benchmarks, at twice the speed. It can generate images mixed with text as well as customizable text-to-speech multilingual audio. 2.0 Flash can also call tools like @Google Search, code execution and third-party
2
6
107
(I have these thoughts perhaps because I liked reading Kevin Kelly as a teenager. And yes I paid Elon 4 bucks to have enough token budget to post anything this long.)
1
0
5