Adam Fisch @adamjfisch X Profile

Adam Fisch

@adamjfisch

Followers

1K

Following

488

Media

19

Statuses

302

Research Scientist @ Google DeepMind | Formerly: PhD @ MIT EECS.

Joined August 2017

Don't wanna be here? Send us removal request.

Adam Fisch

@adamjfisch

22 days

You need to evaluate an AI system and you have three things:. 1. A cheap judge, which is noisy. 🙈.2. An expensive judge, which is accurate. 🧑‍⚖️.3. A budget 💸. How should you spend the budget to get the best possible estimate of model quality?.

3

10

81

Adam Fisch

@adamjfisch

22 days

Work co-led with @ml_angelopoulos , whom we had the pleasure of briefly hosting here at @GoogleDeepMind for this collaboration, together with my GDM and GR colleagues @jacobeisenstein , @JonathanBerant , and Alekh Agarwal.

2

1

3

Adam Fisch

@adamjfisch

22 days

Empirically we get some nice results: our policies have as good precision at far lower budgets. Good potential for eval-hungry AI model dev cycles! Still, there are gaps: estimating the active policy well relies on good uncertainty estimates; current models have some work to do.

1

0

2

Adam Fisch

@adamjfisch

22 days

We explore how much these policies improve over the naïve empirical estimates of E[H] using synthetic + real data. The optimal pi depends on unknown distributional properties of (X, H, G), so we examine performance in theory (using oracle rules) + in practice (when approximated).

1

3

Adam Fisch

@adamjfisch

22 days

We solve for two types of policies: (1) the best fixed sampling rate, pi_random(x) = p*, that doesn’t change with X, and (2) and the best fully active policy pi_active(x) \in (0, 1]. Intuitively, fully active is better when G has variable accuracy (e.g., we see hard + easy Xs).

1

3

Adam Fisch

@adamjfisch

22 days

Specifically, building on the active PPI estimator of Zrnic and Candès, we derive a family of cost-optimal policies, pi(x), that determine the best probabilities for choosing to get H_t, versus choosing to just use G_t, for each X_t.

1

4

Adam Fisch

@adamjfisch

22 days

In our setup, we look at responses X one-by-one. For each X, we can get a cheap rating G = g(X) at a discount, but also maybe choose to get an expensive rating H = h(X). Informally, at the end of the day, we want the best unbiased estimate of E[H] we can get, within our budget.

1

5

Adam Fisch

@adamjfisch

4 months

RT @JonathanBerant: Hi ho!. New work: With amazing collabs @jacobeisenstein @jdjdhekchbdjd @adamjfisch @ddua17 @fan….

0

17

0

Adam Fisch

@adamjfisch

7 months

RT @stats_stephen: Important topic, but this is more of a quick-start guide. For cutting-edge research on LLM evals, see these papers usin….

0

6

0

Adam Fisch

@adamjfisch

8 months

RT @ml_angelopoulos: 🚨 New Textbook on Conformal Prediction 🚨. “The goal of this book is to teach the reader about….

0

90

0

Adam Fisch

@adamjfisch

8 months

Checkout our new paper on Recursive Transformers. Great having Sangmin here at @GoogleDeepMind to lead it! Particularly excited about the potential for continuous depth wise batching for much better early-exiting batch throughout.

Sangmin Bae

@raymin0223

8 months

🚀 Excited to share our latest research @GoogleDeepMind on ♻️Recursive Transformers!. We make smaller LMs by "sharing parameters" across layers. A novel serving paradigm, ✨Continuous Depth-wise Batching, with 🏃Early-Exiting could significantly boost their decoding speed!. 🧵👇

2

4

30

Adam Fisch

@adamjfisch

9 months

RT @aviral_kumar2: This work was led by the amazing @setlur_amrith during his internship at Google Research. With @nagpalchirag,.@adamjfisc….

0

2

0

Adam Fisch

@adamjfisch

9 months

RT @aviral_kumar2: 🚨New paper led by @setlur_amrith on process rewards for reasoning!. Our PRMs that model specific notion of "progress" re….

0

19

0

Adam Fisch

@adamjfisch

9 months

RT @setlur_amrith: 🚨 Exciting new results with dense process reward models (PRMs) for reasoning. Our PRMs scale.✅ search compute by 1.5-5x….

0

41

0

Adam Fisch

@adamjfisch

1 year

@GoogleDeepMind @GoogleResearch @ml_angelopoulos Checkout the paper for more details!. Fun work done together with a great team: @maynez_joshua, @rhofour, @bhuwandhingra, @amirgloberson, and @professorwcohen .

0

3

Adam Fisch

@adamjfisch

1 year

@GoogleDeepMind @GoogleResearch @ml_angelopoulos In particular, when the data / autorater is heterogeneous (which we partition based on autorater confidence), we find that this stratified prediction-powered approach can give us substantially tighter confidence intervals for parameters of interest, such as the mean LLM accuracy.

1

0

3

Adam Fisch

@adamjfisch

1 year

@GoogleDeepMind @GoogleResearch The PPI work of @ml_angelopoulos et. al. allows us to leverage the labeled data to debias the automatic predictions, so that we can get precise, valid confidence intervals for important population parameters. We further improve these estimates by leveraging stratified sampling.

1

0

4

Adam Fisch

@adamjfisch

1 year

@GoogleDeepMind @GoogleResearch Reliable LLM eval is challenging. We can use auto metrics (e.g., LLM-as-a-judge) which are cheap, but possibly inaccurate. Or we can do manual annotation, which is more accurate, but expensive. The tradeoffs of can vary depending on the subdomain (some are easier than others)!.

1

4

Adam Fisch

@adamjfisch

1 year

Excited to share new work from @GoogleDeepMind / @GoogleResearch on improving LLM evals using ML predictions together with a simple but effective stratified sampling approach that strategically divides the underlying data for better performance. Paper:

5

25

125

Adam Fisch

@adamjfisch

1 year

RT @raymin0223: 🚨Check out our new paper, Block Transformer! .We propose an efficient architecture with Global-to-Local language modeling.….

0

30

0