hardy_qr Profile Banner
Fangyu Liu Profile
Fangyu Liu

@hardy_qr

Followers
1K
Following
952
Media
26
Statuses
233

Research Scientist @GoogleDeepMind working on Gemini♊ pretraining. PhD @CambridgeLTL. BMath @UWaterloo. From 成都🐼. Opinions my own.

Mountain View, California
Joined February 2016
Don't wanna be here? Send us removal request.
@hardy_qr
Fangyu Liu
3 months
52.8 > 69.1 = 30.8 TIL
@a__tomala
Alex Tomala
3 months
Graph generated with GPT-5
1
0
14
@hardy_qr
Fangyu Liu
3 months
good amount of visual hallucination!
@m__dehghani
Mostafa Dehghani
3 months
Hopefully this plot isn't the work of GPT-5… otherwise it needs a patch!
0
0
13
@ankesh_anand
Ankesh Anand
8 months
📈📈📈
@mbalunovic
Mislav Balunović
8 months
Big update to our MathArena USAMO evaluation: Gemini 2.5 Pro, which was released *the same day* as our benchmark, is the first model to achieve non-trivial amount of points (24.4%). The speed of progress is really mind-blowing.
12
26
347
@arena
lmarena.ai
8 months
BREAKING: Gemini 2.5 Pro is now #1 on the Arena leaderboard - the largest score jump ever (+40 pts vs Grok-3/GPT-4.5)! 🏆 Tested under codename "nebula"🌌, Gemini 2.5 Pro ranked #1🥇 across ALL categories and UNIQUELY #1 in Math, Creative Writing, Instruction Following, Longer
@GoogleDeepMind
Google DeepMind
8 months
Think you know Gemini? 🤔 Think again. Meet Gemini 2.5: our most intelligent model 💡 The first release is Pro Experimental, which is state-of-the-art across many benchmarks - meaning it can handle complex problems and give more accurate responses. Try it now →
75
405
2K
@m__dehghani
Mostafa Dehghani
8 months
Anyone who has been in this room knows that it’s never just another day in here! This space has seen the extremes of chaos and genius! ...and we ship! https://t.co/qcsBMdnlQA Happy Wednesday everyone!
10
29
207
@MLStreetTalk
Machine Learning Street Talk
10 months
Coding using @cursor_ai 0.45 with the @GoogleDeepMind (new) gemini-2.0-flash-thinking-exp model seems like the biggest step up in genai coding since Claude Sonnet 3.5 came out last June. This is unreal... forget about R1 folks - check out this new Gemini model! 🤯
52
127
2K
@hardy_qr
Fangyu Liu
10 months
Happy to see people like our hyperfitting paper. We are presenting it at ICLR 2025 in Singapore later this year 🇸🇬
@jm_alexia
Alexia Jolicoeur-Martineau
10 months
This is my favorite paper of 2025 so far. "Hyperfitting": When a language model overfits (train loss -> 0, eval loss increases over time), greedy (top-1) decoding leads to high-quality and diverse (non-copied) generated samples. This is so counterintuitive, it feels magical.
2
4
52
@hardy_qr
Fangyu Liu
11 months
Felix was someone we all looked up to in the lab. I'm really sad.
@douwekiela
Douwe Kiela
11 months
I’m really sad that my dear friend @FelixHill84 is no longer with us. He had many friends and colleagues all over the world - to try to ensure we reach them, his family have asked to share this webpage for the celebration of his life: https://t.co/1QoyHmAD3p
0
0
20
@jack_w_rae
Jack Rae
11 months
Appreciate @aidan_mclau looking into the thinking model results. Originally scores looked weak as the response was plucked from the thought content versus output. We are looking into ways of making thinking output less confusing for people running evals. This is why we 🚢, to
@aidan_mclau
Aidan McLaughlin
11 months
two aidanbench updates: > gemini-2.0-flash-thinking is now #2 (explanation for score change below) > deepseek v3 is #22 (thoughts below)
5
9
103
@hardy_qr
Fangyu Liu
11 months
James made incredible contributions to the thinking models. Smart agents are only distillations of other smart agents.
@jamesjyan117153
James An
11 months
So proud to be one of the main core contributor on this effort! Really enjoyed making Gemini smarter!!!
1
0
31
@hardy_qr
Fangyu Liu
11 months
A good thinker doesn't necessarily have to underperform in other tasks 😉
@arena
lmarena.ai
11 months
Breaking news from Chatbot Arena⚡🤔 @GoogleDeepMind's Gemini-2.0-Flash-Thinking debuts as #1 across ALL categories! The leap from Gemini-2.0-Flash: - Overall: #3#1 - Overall (Style Control): #4#1 - Math: #2#1 - Creative Writing: #2#1 - Hard Prompts: #1#1
0
1
21
@hardy_qr
Fangyu Liu
11 months
What's your Final Answer?
@NoamShazeer
Noam Shazeer
11 months
We’ve been *thinking* about how to improve model reasoning and explainability Introducing Gemini 2.0 Flash Thinking, an experimental model trained to think out loud, leading to stronger reasoning performance. Excited to get this first model into the hands of developers to try
0
0
2
@JeffDean
Jeff Dean
11 months
Introducing Gemini 2.0 Flash Thinking, an experimental model that explicitly shows its thoughts. Built on 2.0 Flash’s speed and performance, this model is trained to use thoughts to strengthen its reasoning. And we see promising results when we increase inference time
127
478
4K
@hardy_qr
Fangyu Liu
11 months
A significant portion of what we read today is machine-generated. Fast forward a few years, it might be 95%+ machine-generated. It is a pretty fascinating experiment we are running. Are we as a species gonna mode-collapse, or self-improve?
2
0
4
@hardy_qr
Fangyu Liu
11 months
Me opening this app
1
0
3
@adonis_singh
adi
11 months
'massive organic church of gemini' - gemini 2.0 flash
9
8
148
@robertriachi
Robert Riachi
11 months
A simple yet powerful example of the new Gemini 2.0 Flash's native multimodal input + output. Precise conversational editing & reasoning! Next step, Chess!
23
47
411
@hardy_qr
Fangyu Liu
11 months
It's cool to see capabilities being compounding. Progress at one front eventually accelerates progress at other fronts: ultra long-context, MM-in/out, reasoning/planning, agency, ... And it's all just one model!
@demishassabis
Demis Hassabis
11 months
Thrilled to kick off the Gemini 2.0 era with Gemini 2.0 Flash, an update to our workhorse model that outperforms even 1.5 Pro at twice the speed. It has really great multilingual skills, and can natively call tools, like Google Search. It’s the first release in the Gemini 2.0
0
0
13
@19kaushiks
Kaushik Shivakumar
11 months
Super excited for native image out to be released. Had the opportunity to work with a brilliant team to take this from idea to product over the past year. First going to early access partners, then more widely in early 2025. We'll be sharing some cool demos throughout the day
@GoogleDeepMind
Google DeepMind
11 months
As our workhorse model, Gemini 2.0 Flash outperforms 1.5 Pro on key benchmarks, at twice the speed. It can generate images mixed with text as well as customizable text-to-speech multilingual audio. 2.0 Flash can also call tools like @Google Search, code execution and third-party
2
6
107
@hardy_qr
Fangyu Liu
11 months
(I have these thoughts perhaps because I liked reading Kevin Kelly as a teenager. And yes I paid Elon 4 bucks to have enough token budget to post anything this long.)
1
0
5