Adam Fisch Profile
Adam Fisch

@adamjfisch

Followers
1K
Following
488
Media
19
Statuses
302

Research Scientist @ Google DeepMind | Formerly: PhD @ MIT EECS.

Joined August 2017
Don't wanna be here? Send us removal request.
@adamjfisch
Adam Fisch
22 days
You need to evaluate an AI system and you have three things:. 1. A cheap judge, which is noisy. 🙈.2. An expensive judge, which is accurate. 🧑‍⚖️.3. A budget 💸. How should you spend the budget to get the best possible estimate of model quality?.
3
10
81
@adamjfisch
Adam Fisch
22 days
Work co-led with @ml_angelopoulos , whom we had the pleasure of briefly hosting here at @GoogleDeepMind for this collaboration, together with my GDM and GR colleagues @jacobeisenstein , @JonathanBerant , and Alekh Agarwal.
2
1
3
@adamjfisch
Adam Fisch
22 days
Empirically we get some nice results: our policies have as good precision at far lower budgets. Good potential for eval-hungry AI model dev cycles! Still, there are gaps: estimating the active policy well relies on good uncertainty estimates; current models have some work to do.
1
0
2
@adamjfisch
Adam Fisch
22 days
We explore how much these policies improve over the naïve empirical estimates of E[H] using synthetic + real data. The optimal pi depends on unknown distributional properties of (X, H, G), so we examine performance in theory (using oracle rules) + in practice (when approximated).
Tweet media one
Tweet media two
1
1
3
@adamjfisch
Adam Fisch
22 days
We solve for two types of policies: (1) the best fixed sampling rate, pi_random(x) = p*, that doesn’t change with X, and (2) and the best fully active policy pi_active(x) \in (0, 1]. Intuitively, fully active is better when G has variable accuracy (e.g., we see hard + easy Xs).
1
1
3
@adamjfisch
Adam Fisch
22 days
Specifically, building on the active PPI estimator of Zrnic and Candès, we derive a family of cost-optimal policies, pi(x), that determine the best probabilities for choosing to get H_t, versus choosing to just use G_t, for each X_t.
Tweet media one
1
1
4
@adamjfisch
Adam Fisch
22 days
In our setup, we look at responses X one-by-one. For each X, we can get a cheap rating G = g(X) at a discount, but also maybe choose to get an expensive rating H = h(X). Informally, at the end of the day, we want the best unbiased estimate of E[H] we can get, within our budget.
Tweet media one
1
1
5
@adamjfisch
Adam Fisch
4 months
RT @JonathanBerant: Hi ho!. New work: With amazing collabs @jacobeisenstein @jdjdhekchbdjd @adamjfisch @ddua17 @fan….
0
17
0
@adamjfisch
Adam Fisch
7 months
RT @stats_stephen: Important topic, but this is more of a quick-start guide. For cutting-edge research on LLM evals, see these papers usin….
0
6
0
@adamjfisch
Adam Fisch
8 months
RT @ml_angelopoulos: 🚨 New Textbook on Conformal Prediction 🚨. “The goal of this book is to teach the reader about….
0
90
0
@adamjfisch
Adam Fisch
8 months
Checkout our new paper on Recursive Transformers. Great having Sangmin here at @GoogleDeepMind to lead it! Particularly excited about the potential for continuous depth wise batching for much better early-exiting batch throughout.
@raymin0223
Sangmin Bae
8 months
🚀 Excited to share our latest research @GoogleDeepMind on ♻️Recursive Transformers!. We make smaller LMs by "sharing parameters" across layers. A novel serving paradigm, ✨Continuous Depth-wise Batching, with 🏃Early-Exiting could significantly boost their decoding speed!. 🧵👇
Tweet media one
2
4
30
@adamjfisch
Adam Fisch
9 months
RT @aviral_kumar2: This work was led by the amazing @setlur_amrith during his internship at Google Research. With @nagpalchirag,.@adamjfisc….
0
2
0
@adamjfisch
Adam Fisch
9 months
RT @aviral_kumar2: 🚨New paper led by @setlur_amrith on process rewards for reasoning!. Our PRMs that model specific notion of "progress" re….
0
19
0
@adamjfisch
Adam Fisch
9 months
RT @setlur_amrith: 🚨 Exciting new results with dense process reward models (PRMs) for reasoning. Our PRMs scale.✅ search compute by 1.5-5x….
0
41
0
@adamjfisch
Adam Fisch
1 year
@GoogleDeepMind @GoogleResearch @ml_angelopoulos Checkout the paper for more details!. Fun work done together with a great team: @maynez_joshua, @rhofour, @bhuwandhingra, @amirgloberson, and @professorwcohen .
0
0
3
@adamjfisch
Adam Fisch
1 year
@GoogleDeepMind @GoogleResearch @ml_angelopoulos In particular, when the data / autorater is heterogeneous (which we partition based on autorater confidence), we find that this stratified prediction-powered approach can give us substantially tighter confidence intervals for parameters of interest, such as the mean LLM accuracy.
1
0
3
@adamjfisch
Adam Fisch
1 year
@GoogleDeepMind @GoogleResearch The PPI work of @ml_angelopoulos et. al. allows us to leverage the labeled data to debias the automatic predictions, so that we can get precise, valid confidence intervals for important population parameters. We further improve these estimates by leveraging stratified sampling.
1
0
4
@adamjfisch
Adam Fisch
1 year
@GoogleDeepMind @GoogleResearch Reliable LLM eval is challenging. We can use auto metrics (e.g., LLM-as-a-judge) which are cheap, but possibly inaccurate. Or we can do manual annotation, which is more accurate, but expensive. The tradeoffs of can vary depending on the subdomain (some are easier than others)!.
1
1
4
@adamjfisch
Adam Fisch
1 year
Excited to share new work from @GoogleDeepMind / @GoogleResearch on improving LLM evals using ML predictions together with a simple but effective stratified sampling approach that strategically divides the underlying data for better performance. Paper:
5
25
125
@adamjfisch
Adam Fisch
1 year
RT @raymin0223: 🚨Check out our new paper, Block Transformer! .We propose an efficient architecture with Global-to-Local language modeling.….
0
30
0