Lily Zhang @lilyhzhang X Profile

Lily Zhang

@lilyhzhang

Followers

626

Following

942

Media

20

Statuses

113

Research Scientist @GoogleDeepmind. Previously @Meta, @Google, @NYU, @Harvard.

https://t.co/bB5P1mZXMk

Joined June 2021

Don't wanna be here? Send us removal request.

Lily Zhang

@lilyhzhang

9 months

Interest in preference learning has exploded since RLHF, but preference learning is still immature relative to other ML tasks like classification, etc. How do we close this gap? Our new paper, “Preference Learning Made Easy,” offers some suggestions🧵

5

12

99

Lily Zhang

@lilyhzhang

4 months

Thrilled to share the Community Alignment dataset -- the product of a massive collaborative effort with so many awesome folks. Can't wait to see the future research it unlocks!

smitha milli

@SmithaMilli

4 months

Today we're releasing Community Alignment - the largest open-source dataset of human preferences for LLMs, containing ~200k comparisons from >3000 annotators in 5 countries / languages! There was a lot of research that went into this... 🧵

0

5

31

smitha milli

@SmithaMilli

4 months

Today we're releasing Community Alignment - the largest open-source dataset of human preferences for LLMs, containing ~200k comparisons from >3000 annotators in 5 countries / languages! There was a lot of research that went into this... 🧵

12

70

331

smitha milli

@SmithaMilli

4 months

This was a big project and collective effort -- major thanks to all the collaborators (see image)🙏 @lilyhzhang and I will be presenting it at the ICML MoFA workshop on Friday, say hi if you want to chat more!

1

4

20

Lily Zhang

@lilyhzhang

9 months

Working on this paper has really simplified my own understanding of preference learning. I hope reading it will do the same for you! https://t.co/B8zr0W9wOi (10/n)

1

8

Lily Zhang

@lilyhzhang

9 months

Our analysis also offers concrete guidance for new research. Namely, future efforts should focus on either 1. developing non-WRO more closely aligned with WRO (i.e., better surrogate objectives) or 2. improving the optimization of WRO objectives. (9/n)

1

0

2

Lily Zhang

@lilyhzhang

9 months

Our analysis offers and motivates practical advice including: 1. Use multiple seeds for RLHF to address optimization challenges, 2. Ensure careful checkpointing in DPO due to lack of win rate correspondence, and 3. Prioritize (preference) diversity to improve SFT. (8/n)

1

0

2

Lily Zhang

@lilyhzhang

9 months

Empirically, we see WRO methods underperform relative to theoretical expectations due to difficulties in optimization. In fact, we observe across WRO objectives that optimization success matters more than other choices which affect the target distribution. (7/n)

1

0

2

Lily Zhang

@lilyhzhang

9 months

Our framework enables us to easily extend WRO beyond existing methods, e.g., to generic win rate games. Our analysis also presents theoretical limitations of non-WRO methods to explain empirical results observed previously, from checkpointing in DPO to data creation in SFT. (6/n)

1

Lily Zhang

@lilyhzhang

9 months

We identify two important theoretical benefits of WRO that generally do not hold in non-WRO: 1. Win rate-correspondence (improving the objective improves win rate), and 2. Win rate-consistency (optimum of the objective maximizes win rate). (5/n)

1

0

1

Lily Zhang

@lilyhzhang

9 months

We next characterize the space of preference learning as win rate optimization (WRO) and non-WRO, e.g., RLHF and Nash Learning from Human Feedback are a WRO objective and game respectively, and DPO / other direct alignment algorithms and SFT are non-WRO objectives. (4/n)

1

0

1

Lily Zhang

@lilyhzhang

9 months

Win rate is already a common evaluation, but our result yields a deeper insight: it is the only appropriate evaluation based on preference data alone; any other function either fails to evaluate the model or does not respect the preference data. (3/n)

1

0

1

Lily Zhang

@lilyhzhang

9 months

We start by establishing what evaluations make sense for preference learning. Our first insight is that the only evaluation of a generative model that respects preferences and prevalences of the underlying preference data sampling distribution is some variant of win rate. (2/n)

1

0

2

Raghav Singhal

@_rk_singhal

10 months

Got a diffusion model? What if there were a way to: - Get SOTA text-to-image prompt fidelity, with no extra training! - Steer continuous and discrete (e.g. text) diffusions - Beat larger models using less compute - Outperform fine-tuning - And keep your stats friends happy !?

6

41

208

NYU Center for Data Science

@NYUDataScience

1 year

CDS PhD student Lily H. Zhang @lilyhzhang, with CDS Faculty Fellow Aahlad Puli @aahladpuli & CDS Assoc. Prof. Rajesh Ranganath, propose a novel AI-based approach for detecting rare particles at the Large Hadron Collider. https://t.co/G6seCQK5Kr

nyudatascience.medium.com

CDS researchers introduce a novel AI method for detecting rare particles at the Large Hadron Collider.

0

1

5

Lily Zhang

@lilyhzhang

1 year

Work done during internship with Google Translate Research, with Hamid Dadkhahi, @marafinkels, Firas Trabelsi, Jiaming Luo, and @markuseful. I see TWA as part of a larger trend to harness alternative sources of supervision to enhance language models—this is just the start! (n/n)

0

1

Lily Zhang

@lilyhzhang

1 year

Experiments on English-German and Chinese-English machine translation show that TWA outperforms baselines such as supervised finetuning on sequences filtered for quality and Direct Preference Optimization on pairs constructed from the same data (5/n).

1

0

2

Lily Zhang

@lilyhzhang

1 year

We test TWA with existing Multidimensional Quality Metrics (MQM) data from past WMT competitions, consisting of machine translations and annotations of their errors. This data has been used to evaluate MT systems but have not yet to directly finetune MT systems (4/n).

1

0

1

Lily Zhang

@lilyhzhang

1 year

TWA is also very simple: cross-entropy on tokens preceding the first error (assuming they are high-quality), span-level unlikelihood on errors (to let the model learn what to penalize), and zero loss on non-error tokens after an error (given they are off-trajectory) (3/n).

1

0

2

Lily Zhang

@lilyhzhang

1 year

TWA differs from other methods which utilize fine-grained feedback in that it directly finetunes a language model with examples and their span-level error annotations, allowing it to take advantage of offline data without the need for an auxiliary annotation or reward model (2/n)

1

0

1