dancherp Profile Banner
Dan Zheng Profile
Dan Zheng

@dancherp

Followers
1K
Following
5K
Media
558
Statuses
2K

Learning for Code @GoogleDeepMind 💭 Programming languages and machine learning

Mountain View, CA
Joined October 2010
Don't wanna be here? Send us removal request.
@dancherp
Dan Zheng
4 years
My dad retired today! I wanted to share some thoughts.
@dancherp
Dan Zheng
4 years
我爸今天退休了。我有些话想说哟。
4
0
32
@dancherp
Dan Zheng
12 days
Laser eye surgery later today, you might not see me in glasses again
0
0
1
@dancherp
Dan Zheng
15 days
Dan-zheng with the stars :facepalm:
0
0
1
@dancherp
Dan Zheng
15 days
Danzing with the stars
@DZhang50
Dan Zhang @ NeurIPS
15 days
now recruiting for our new startup! we only hire people named Dan Z*ng
1
0
8
@dancherp
Dan Zheng
17 days
Minimize time spent sitting listening Maximize social interaction time
0
0
0
@dancherp
Dan Zheng
17 days
Conference workshops should have more time for posters / socialization than talks!
1
0
5
@dancherp
Dan Zheng
20 days
At the #NeurIPS2025 Google booth today 11 am to 12 pm if you'd like to chat about code AI, working in industry, or more!
@dancherp
Dan Zheng
23 days
Excited to be at #NeurIPS2025 - reach out if you'd like to chat about code AI, particularly generated code security or RL for code :)
0
0
2
@dancherp
Dan Zheng
23 days
Excited to be at #NeurIPS2025 - reach out if you'd like to chat about code AI, particularly generated code security or RL for code :)
0
0
4
@GoogleDeepMind
Google DeepMind
1 month
This is Gemini 3: our most intelligent model that helps you learn, build and plan anything. It comes with state-of-the-art reasoning capabilities, world-leading multimodal understanding, and enables new agentic coding experiences. 🧵
218
1K
7K
@dancherp
Dan Zheng
2 months
@dancherp
Dan Zheng
5 months
Promising idea: prompt optimization (getting max performance with model, without training) → prompt distillation (baking the improved performance into the model, via RL) – iteratively
0
0
1
@dancherp
Dan Zheng
2 months
I think the "self-distillation via optimized prompts" idea (↓) is like "on-policy distillation" But using a prompt-optimized model as the reverse-KL teacher for its "basic system prompt" self Iterate prompt optzn and self-distillation for gains? Idea:
@thinkymachines
Thinking Machines
2 months
Our latest post explores on-policy distillation, a training approach that unites the error-correcting relevance of RL with the reward density of SFT. When training it for math reasoning and as an internal chat assistant, we find that on-policy distillation can outperform other
1
0
6
@dancherp
Dan Zheng
3 months
Another funny memoir name: Regress and Regrets (a la Jane Austen)
@dancherp
Dan Zheng
4 years
Imagine a memoir called “Diminishing Returns”
0
0
1
@dancherp
Dan Zheng
4 months
The liminal space where your devices haven't yet updated to the new time zone
0
0
3
@dancherp
Dan Zheng
4 months
But then still purchasing some WiFi
0
0
1
@dancherp
Dan Zheng
4 months
Downloading a local LLM before flight: carrying a pocket-sized, offline, internet knowledge chat system
2
0
7
@dancherp
Dan Zheng
5 months
@dancherp
Dan Zheng
5 months
An ideal plot might look like this, for iterated prompt optimization → prompt distillation (Adapted from the GEPA paper) Maybe there'll be some results like this soon :)
0
1
3
@dancherp
Dan Zheng
5 months
Two different perspectives / use cases
0
1
4
@dancherp
Dan Zheng
5 months
This also connects prompt optimization – which is for specific downstream tasks and requires using optimized system prompts With model improvement – baking improvements into models' default behavior, robust even for simple user prompts, optimized prompts no longer needed
1
1
4
@dancherp
Dan Zheng
5 months
An ideal plot might look like this, for iterated prompt optimization → prompt distillation (Adapted from the GEPA paper) Maybe there'll be some results like this soon :)
@dancherp
Dan Zheng
5 months
Promising idea: prompt optimization (getting max performance with model, without training) → prompt distillation (baking the improved performance into the model, via RL) – iteratively
4
2
15
@dancherp
Dan Zheng
5 months
NB: prompt distillation doesn't have to be RL Could be soft distillation: from [optimized system prompt, user prompt → output] to [basic system prompt, user prompt → output] for the same model With some KL divergence loss, or even SFT
0
0
5