Fangyu Liu @hardy_qr X Profile

Fangyu Liu

@hardy_qr

Followers

1K

Following

952

Media

26

Statuses

233

Research Scientist @GoogleDeepMind working on Gemini♊ pretraining. PhD @CambridgeLTL. BMath @UWaterloo. From 成都🐼. Opinions my own.

https://t.co/h3iZV0PSNM

Mountain View, California

Joined February 2016

Don't wanna be here? Send us removal request.

Fangyu Liu

@hardy_qr

3 months

52.8 > 69.1 = 30.8 TIL

Alex Tomala

@a__tomala

3 months

Graph generated with GPT-5

1

0

14

Fangyu Liu

@hardy_qr

3 months

good amount of visual hallucination!

Mostafa Dehghani

@m__dehghani

3 months

Hopefully this plot isn't the work of GPT-5… otherwise it needs a patch!

0

13

Ankesh Anand

@ankesh_anand

8 months

📈📈📈

Mislav Balunović

@mbalunovic

8 months

Big update to our MathArena USAMO evaluation: Gemini 2.5 Pro, which was released *the same day* as our benchmark, is the first model to achieve non-trivial amount of points (24.4%). The speed of progress is really mind-blowing.

12

26

347

lmarena.ai

@arena

8 months

BREAKING: Gemini 2.5 Pro is now #1 on the Arena leaderboard - the largest score jump ever (+40 pts vs Grok-3/GPT-4.5)! 🏆 Tested under codename "nebula"🌌, Gemini 2.5 Pro ranked #1🥇 across ALL categories and UNIQUELY #1 in Math, Creative Writing, Instruction Following, Longer

Google DeepMind

@GoogleDeepMind

8 months

Think you know Gemini? 🤔 Think again. Meet Gemini 2.5: our most intelligent model 💡 The first release is Pro Experimental, which is state-of-the-art across many benchmarks - meaning it can handle complex problems and give more accurate responses. Try it now →

75

405

2K

Mostafa Dehghani

@m__dehghani

8 months

Anyone who has been in this room knows that it’s never just another day in here! This space has seen the extremes of chaos and genius! ...and we ship! https://t.co/qcsBMdnlQA Happy Wednesday everyone!

10

29

207

Machine Learning Street Talk

@MLStreetTalk

10 months

Coding using @cursor_ai 0.45 with the @GoogleDeepMind (new) gemini-2.0-flash-thinking-exp model seems like the biggest step up in genai coding since Claude Sonnet 3.5 came out last June. This is unreal... forget about R1 folks - check out this new Gemini model! 🤯

52

127

2K

Fangyu Liu

@hardy_qr

10 months

Happy to see people like our hyperfitting paper. We are presenting it at ICLR 2025 in Singapore later this year 🇸🇬

Alexia Jolicoeur-Martineau

@jm_alexia

10 months

This is my favorite paper of 2025 so far. "Hyperfitting": When a language model overfits (train loss -> 0, eval loss increases over time), greedy (top-1) decoding leads to high-quality and diverse (non-copied) generated samples. This is so counterintuitive, it feels magical.

2

4

52

Fangyu Liu

@hardy_qr

11 months

Felix was someone we all looked up to in the lab. I'm really sad.

Douwe Kiela

@douwekiela

11 months

I’m really sad that my dear friend @FelixHill84 is no longer with us. He had many friends and colleagues all over the world - to try to ensure we reach them, his family have asked to share this webpage for the celebration of his life: https://t.co/1QoyHmAD3p

0

20

Jack Rae

@jack_w_rae

11 months

Appreciate @aidan_mclau looking into the thinking model results. Originally scores looked weak as the response was plucked from the thought content versus output. We are looking into ways of making thinking output less confusing for people running evals. This is why we 🚢, to

Aidan McLaughlin

@aidan_mclau

11 months

two aidanbench updates: > gemini-2.0-flash-thinking is now #2 (explanation for score change below) > deepseek v3 is #22 (thoughts below)

5

9

103

Fangyu Liu

@hardy_qr

11 months

James made incredible contributions to the thinking models. Smart agents are only distillations of other smart agents.

James An

@jamesjyan117153

11 months

So proud to be one of the main core contributor on this effort! Really enjoyed making Gemini smarter!!!

1

0

31

Fangyu Liu

@hardy_qr

11 months

A good thinker doesn't necessarily have to underperform in other tasks 😉

lmarena.ai

@arena

11 months

Breaking news from Chatbot Arena⚡🤔 @GoogleDeepMind's Gemini-2.0-Flash-Thinking debuts as #1 across ALL categories! The leap from Gemini-2.0-Flash: - Overall: #3 → #1 - Overall (Style Control): #4 → #1 - Math: #2 → #1 - Creative Writing: #2 → #1 - Hard Prompts: #1 → #1

0

1

21

Fangyu Liu

@hardy_qr

11 months

What's your Final Answer?

Noam Shazeer

@NoamShazeer

11 months

We’ve been *thinking* about how to improve model reasoning and explainability Introducing Gemini 2.0 Flash Thinking, an experimental model trained to think out loud, leading to stronger reasoning performance. Excited to get this first model into the hands of developers to try

0

2

Jeff Dean

@JeffDean

11 months

Introducing Gemini 2.0 Flash Thinking, an experimental model that explicitly shows its thoughts. Built on 2.0 Flash’s speed and performance, this model is trained to use thoughts to strengthen its reasoning. And we see promising results when we increase inference time

127

478

4K

Fangyu Liu

@hardy_qr

11 months

A significant portion of what we read today is machine-generated. Fast forward a few years, it might be 95%+ machine-generated. It is a pretty fascinating experiment we are running. Are we as a species gonna mode-collapse, or self-improve?

2

0

4

Fangyu Liu

@hardy_qr

11 months

Me opening this app

1

0

3

adi

@adonis_singh

11 months

'massive organic church of gemini' - gemini 2.0 flash

9

8

148

Robert Riachi

@robertriachi

11 months

A simple yet powerful example of the new Gemini 2.0 Flash's native multimodal input + output. Precise conversational editing & reasoning! Next step, Chess!

23

47

411

Fangyu Liu

@hardy_qr

11 months

It's cool to see capabilities being compounding. Progress at one front eventually accelerates progress at other fronts: ultra long-context, MM-in/out, reasoning/planning, agency, ... And it's all just one model!

Demis Hassabis

@demishassabis

11 months

Thrilled to kick off the Gemini 2.0 era with Gemini 2.0 Flash, an update to our workhorse model that outperforms even 1.5 Pro at twice the speed. It has really great multilingual skills, and can natively call tools, like Google Search. It’s the first release in the Gemini 2.0

0

13

Kaushik Shivakumar

@19kaushiks

11 months

Super excited for native image out to be released. Had the opportunity to work with a brilliant team to take this from idea to product over the past year. First going to early access partners, then more widely in early 2025. We'll be sharing some cool demos throughout the day

Google DeepMind

@GoogleDeepMind

11 months

As our workhorse model, Gemini 2.0 Flash outperforms 1.5 Pro on key benchmarks, at twice the speed. It can generate images mixed with text as well as customizable text-to-speech multilingual audio. 2.0 Flash can also call tools like @Google Search, code execution and third-party

2

6

107

Fangyu Liu

@hardy_qr

11 months

(I have these thoughts perhaps because I liked reading Kevin Kelly as a teenager. And yes I paid Elon 4 bucks to have enough token budget to post anything this long.)

1

0

5