
Dan Deutsch
@_danieldeutsch
Followers
613
Following
120
Media
16
Statuses
90
Research Scientist at Google Translate working on text generation evaluation
San Francisco
Joined September 2012
Excited to receive an Outstanding Paper award for this work at @emnlpmeeting! Thanks to my co-authors George Foster and @markuseful! Updated version available here:
LLM-based metrics like GEMBA predict many ties, but the way that ties should be handled in Kendall’s tau for meta-evaluating metrics has been a longstanding issue. We propose an update to the meta-evaluation methodology to handle ties.
4
9
70
RT @iseeaswell: Working on Low Resource Languages? Want to help with SMOL? join our new discord!
0
1
0
RT @markuseful: Two new datasets from Google Translate targeting high and low resource languages!. WMT24++: 46 new en->xx languages to WMT….
0
26
0
RT @iseeaswell: 😼SMOL DATA ALERT! 😼Anouncing SMOL, a professionally-translated dataset for 115 very low-resource languages! Paper: https://….
0
11
0
@shrutirij @prk_riley @esalesk @FirasTr88060642 Stephanie Winkler @BZhangGo @markuseful . #nlproc #nlp #ai.
1
0
1
This project was a highly collaborative effort with many people contributing translations, evaluations, analyses, etc., so I want to thank all of my co-authors! @ebriakou @iseeaswell @marafinkels Rebecca Galor @JurikJuraska @gezakovacs Alison Lui @RicardoRei7 @jasonriesa.
1
0
2
RT @mykocyigit: Thrilled to share our latest findings on data contamination, from my internship at @Google! We trained almost 90 Models on….
0
19
0
RT @JurikJuraska: 🚀 We have just released bfloat16 variants of all 3 MetricX-24 models, offering nearly identical performance to their floa….
0
2
0
RT @JurikJuraska: 🌐 Meet MetricX-24, our SOTA machine translation evaluation metric and a successor to the successful MetricX-23. 🚀 Now ope….
0
6
0
Super simple and effective way of significantly increasing the performance of your evaluation metric!.
LLMs are typically evaluated w/ automatic metrics on standard test sets, but metrics + test sets are developed independently. This raises a crucial question: Can we design automatic metrics specifically to excel on the test sets we prioritize? Answer: Yes!.
0
0
8