Dan Zheng
@dancherp
Followers
1K
Following
5K
Media
558
Statuses
2K
Learning for Code @GoogleDeepMind 💭 Programming languages and machine learning
Mountain View, CA
Joined October 2010
Laser eye surgery later today, you might not see me in glasses again
0
0
1
Minimize time spent sitting listening Maximize social interaction time
0
0
0
Conference workshops should have more time for posters / socialization than talks!
1
0
5
At the #NeurIPS2025 Google booth today 11 am to 12 pm if you'd like to chat about code AI, working in industry, or more!
Excited to be at #NeurIPS2025 - reach out if you'd like to chat about code AI, particularly generated code security or RL for code :)
0
0
2
Excited to be at #NeurIPS2025 - reach out if you'd like to chat about code AI, particularly generated code security or RL for code :)
0
0
4
This is Gemini 3: our most intelligent model that helps you learn, build and plan anything. It comes with state-of-the-art reasoning capabilities, world-leading multimodal understanding, and enables new agentic coding experiences. 🧵
218
1K
7K
I think the "self-distillation via optimized prompts" idea (↓) is like "on-policy distillation" But using a prompt-optimized model as the reverse-KL teacher for its "basic system prompt" self Iterate prompt optzn and self-distillation for gains? Idea:
Our latest post explores on-policy distillation, a training approach that unites the error-correcting relevance of RL with the reward density of SFT. When training it for math reasoning and as an internal chat assistant, we find that on-policy distillation can outperform other
1
0
6
The liminal space where your devices haven't yet updated to the new time zone
0
0
3
Downloading a local LLM before flight: carrying a pocket-sized, offline, internet knowledge chat system
2
0
7
This also connects prompt optimization – which is for specific downstream tasks and requires using optimized system prompts With model improvement – baking improvements into models' default behavior, robust even for simple user prompts, optimized prompts no longer needed
1
1
4
An ideal plot might look like this, for iterated prompt optimization → prompt distillation (Adapted from the GEPA paper) Maybe there'll be some results like this soon :)
Promising idea: prompt optimization (getting max performance with model, without training) → prompt distillation (baking the improved performance into the model, via RL) – iteratively
4
2
15
NB: prompt distillation doesn't have to be RL Could be soft distillation: from [optimized system prompt, user prompt → output] to [basic system prompt, user prompt → output] for the same model With some KL divergence loss, or even SFT
0
0
5