Lily Zhang Profile
Lily Zhang

@lilyhzhang

Followers
626
Following
942
Media
20
Statuses
113

Research Scientist @GoogleDeepmind. Previously @Meta, @Google, @NYU, @Harvard.

Joined June 2021
Don't wanna be here? Send us removal request.
@lilyhzhang
Lily Zhang
9 months
Interest in preference learning has exploded since RLHF, but preference learning is still immature relative to other ML tasks like classification, etc. How do we close this gap? Our new paper, “Preference Learning Made Easy,” offers some suggestions🧵
5
12
99
@lilyhzhang
Lily Zhang
4 months
Thrilled to share the Community Alignment dataset -- the product of a massive collaborative effort with so many awesome folks. Can't wait to see the future research it unlocks!
@SmithaMilli
smitha milli
4 months
Today we're releasing Community Alignment - the largest open-source dataset of human preferences for LLMs, containing ~200k comparisons from >3000 annotators in 5 countries / languages! There was a lot of research that went into this... 🧵
0
5
31
@SmithaMilli
smitha milli
4 months
Today we're releasing Community Alignment - the largest open-source dataset of human preferences for LLMs, containing ~200k comparisons from >3000 annotators in 5 countries / languages! There was a lot of research that went into this... 🧵
12
70
331
@SmithaMilli
smitha milli
4 months
This was a big project and collective effort -- major thanks to all the collaborators (see image)🙏 @lilyhzhang and I will be presenting it at the ICML MoFA workshop on Friday, say hi if you want to chat more!
1
4
20
@lilyhzhang
Lily Zhang
9 months
Working on this paper has really simplified my own understanding of preference learning. I hope reading it will do the same for you! https://t.co/B8zr0W9wOi (10/n)
1
1
8
@lilyhzhang
Lily Zhang
9 months
Our analysis also offers concrete guidance for new research. Namely, future efforts should focus on either 1. developing non-WRO more closely aligned with WRO (i.e., better surrogate objectives) or 2. improving the optimization of WRO objectives. (9/n)
1
0
2
@lilyhzhang
Lily Zhang
9 months
Our analysis offers and motivates practical advice including: 1. Use multiple seeds for RLHF to address optimization challenges, 2. Ensure careful checkpointing in DPO due to lack of win rate correspondence, and 3. Prioritize (preference) diversity to improve SFT. (8/n)
1
0
2
@lilyhzhang
Lily Zhang
9 months
Empirically, we see WRO methods underperform relative to theoretical expectations due to difficulties in optimization. In fact, we observe across WRO objectives that optimization success matters more than other choices which affect the target distribution. (7/n)
1
0
2
@lilyhzhang
Lily Zhang
9 months
Our framework enables us to easily extend WRO beyond existing methods, e.g., to generic win rate games. Our analysis also presents theoretical limitations of non-WRO methods to explain empirical results observed previously, from checkpointing in DPO to data creation in SFT. (6/n)
1
1
1
@lilyhzhang
Lily Zhang
9 months
We identify two important theoretical benefits of WRO that generally do not hold in non-WRO: 1. Win rate-correspondence (improving the objective improves win rate), and 2. Win rate-consistency (optimum of the objective maximizes win rate). (5/n)
1
0
1
@lilyhzhang
Lily Zhang
9 months
We next characterize the space of preference learning as win rate optimization (WRO) and non-WRO, e.g., RLHF and Nash Learning from Human Feedback are a WRO objective and game respectively, and DPO / other direct alignment algorithms and SFT are non-WRO objectives. (4/n)
1
0
1
@lilyhzhang
Lily Zhang
9 months
Win rate is already a common evaluation, but our result yields a deeper insight: it is the only appropriate evaluation based on preference data alone; any other function either fails to evaluate the model or does not respect the preference data. (3/n)
1
0
1
@lilyhzhang
Lily Zhang
9 months
We start by establishing what evaluations make sense for preference learning. Our first insight is that the only evaluation of a generative model that respects preferences and prevalences of the underlying preference data sampling distribution is some variant of win rate. (2/n)
1
0
2
@_rk_singhal
Raghav Singhal
10 months
Got a diffusion model? What if there were a way to: - Get SOTA text-to-image prompt fidelity, with no extra training! - Steer continuous and discrete (e.g. text) diffusions - Beat larger models using less compute - Outperform fine-tuning - And keep your stats friends happy !?
6
41
208
@NYUDataScience
NYU Center for Data Science
1 year
CDS PhD student Lily H. Zhang @lilyhzhang, with CDS Faculty Fellow Aahlad Puli @aahladpuli & CDS Assoc. Prof. Rajesh Ranganath, propose a novel AI-based approach for detecting rare particles at the Large Hadron Collider. https://t.co/G6seCQK5Kr
nyudatascience.medium.com
CDS researchers introduce a novel AI method for detecting rare particles at the Large Hadron Collider.
0
1
5
@lilyhzhang
Lily Zhang
1 year
Work done during internship with Google Translate Research, with Hamid Dadkhahi, @marafinkels, Firas Trabelsi, Jiaming Luo, and @markuseful. I see TWA as part of a larger trend to harness alternative sources of supervision to enhance language models—this is just the start! (n/n)
0
0
1
@lilyhzhang
Lily Zhang
1 year
Experiments on English-German and Chinese-English machine translation show that TWA outperforms baselines such as supervised finetuning on sequences filtered for quality and Direct Preference Optimization on pairs constructed from the same data (5/n).
1
0
2
@lilyhzhang
Lily Zhang
1 year
We test TWA with existing Multidimensional Quality Metrics (MQM) data from past WMT competitions, consisting of machine translations and annotations of their errors. This data has been used to evaluate MT systems but have not yet to directly finetune MT systems (4/n).
1
0
1
@lilyhzhang
Lily Zhang
1 year
TWA is also very simple: cross-entropy on tokens preceding the first error (assuming they are high-quality), span-level unlikelihood on errors (to let the model learn what to penalize), and zero loss on non-error tokens after an error (given they are off-trajectory) (3/n).
1
0
2
@lilyhzhang
Lily Zhang
1 year
TWA differs from other methods which utilize fine-grained feedback in that it directly finetunes a language model with examples and their span-level error annotations, allowing it to take advantage of offline data without the need for an auxiliary annotation or reward model (2/n)
1
0
1