Dan Zheng @dancherp X Profile

Dan Zheng

@dancherp

Followers

1K

Following

5K

Media

558

Statuses

2K

Learning for Code @GoogleDeepMind 💭 Programming languages and machine learning

https://t.co/I3OEdqueZt

Mountain View, CA

Joined October 2010

Don't wanna be here? Send us removal request.

Dan Zheng

@dancherp

4 years

My dad retired today! I wanted to share some thoughts.

Dan Zheng

@dancherp

4 years

我爸今天退休了。我有些话想说哟。

4

0

32

Dan Zheng

@dancherp

12 days

Laser eye surgery later today, you might not see me in glasses again

0

1

Dan Zheng

@dancherp

15 days

Dan-zheng with the stars :facepalm:

0

1

Dan Zheng

@dancherp

15 days

Danzing with the stars

Dan Zhang @ NeurIPS

@DZhang50

15 days

now recruiting for our new startup! we only hire people named Dan Z*ng

1

0

8

Dan Zheng

@dancherp

17 days

Minimize time spent sitting listening Maximize social interaction time

0

Dan Zheng

@dancherp

17 days

Conference workshops should have more time for posters / socialization than talks!

1

0

5

Dan Zheng

@dancherp

20 days

At the #NeurIPS2025 Google booth today 11 am to 12 pm if you'd like to chat about code AI, working in industry, or more!

Dan Zheng

@dancherp

23 days

Excited to be at #NeurIPS2025 - reach out if you'd like to chat about code AI, particularly generated code security or RL for code :)

0

2

Dan Zheng

@dancherp

23 days

Excited to be at #NeurIPS2025 - reach out if you'd like to chat about code AI, particularly generated code security or RL for code :)

0

4

Google DeepMind

@GoogleDeepMind

1 month

This is Gemini 3: our most intelligent model that helps you learn, build and plan anything. It comes with state-of-the-art reasoning capabilities, world-leading multimodal understanding, and enables new agentic coding experiences. 🧵

218

1K

7K

Dan Zheng

@dancherp

2 months

https://t.co/94Sb7OdqL5

Dan Zheng

@dancherp

5 months

Promising idea: prompt optimization (getting max performance with model, without training) → prompt distillation (baking the improved performance into the model, via RL) – iteratively

0

1

Dan Zheng

@dancherp

2 months

I think the "self-distillation via optimized prompts" idea (↓) is like "on-policy distillation" But using a prompt-optimized model as the reverse-KL teacher for its "basic system prompt" self Iterate prompt optzn and self-distillation for gains? Idea:

Thinking Machines

@thinkymachines

2 months

Our latest post explores on-policy distillation, a training approach that unites the error-correcting relevance of RL with the reward density of SFT. When training it for math reasoning and as an internal chat assistant, we find that on-policy distillation can outperform other

1

0

6

Dan Zheng

@dancherp

3 months

Another funny memoir name: Regress and Regrets (a la Jane Austen)

Dan Zheng

@dancherp

4 years

Imagine a memoir called “Diminishing Returns”

0

1

Dan Zheng

@dancherp

4 months

The liminal space where your devices haven't yet updated to the new time zone

0

3

Dan Zheng

@dancherp

4 months

But then still purchasing some WiFi

0

1

Dan Zheng

@dancherp

4 months

Downloading a local LLM before flight: carrying a pocket-sized, offline, internet knowledge chat system

2

0

7

Dan Zheng

@dancherp

5 months

https://t.co/rCBEPk7oRD

Dan Zheng

@dancherp

5 months

An ideal plot might look like this, for iterated prompt optimization → prompt distillation (Adapted from the GEPA paper) Maybe there'll be some results like this soon :)

0

1

3

Dan Zheng

@dancherp

5 months

Two different perspectives / use cases

0

1

4

Dan Zheng

@dancherp

5 months

This also connects prompt optimization – which is for specific downstream tasks and requires using optimized system prompts With model improvement – baking improvements into models' default behavior, robust even for simple user prompts, optimized prompts no longer needed

1

4

Dan Zheng

@dancherp

5 months

https://t.co/leFWMWeMn5 @LakshyAAAgrawal @ShangyinT

arxiv.org

Large language models (LLMs) are increasingly adapted to downstream tasks via reinforcement learning (RL) methods like Group Relative Policy Optimization (GRPO), which often require thousands of...

0

1

5

Dan Zheng

@dancherp

5 months

An ideal plot might look like this, for iterated prompt optimization → prompt distillation (Adapted from the GEPA paper) Maybe there'll be some results like this soon :)

Dan Zheng

@dancherp

5 months

Promising idea: prompt optimization (getting max performance with model, without training) → prompt distillation (baking the improved performance into the model, via RL) – iteratively

4

2

15

Dan Zheng

@dancherp

5 months

NB: prompt distillation doesn't have to be RL Could be soft distillation: from [optimized system prompt, user prompt → output] to [basic system prompt, user prompt → output] for the same model With some KL divergence loss, or even SFT

0

5