Stephen Pfohl Profile
Stephen Pfohl

@stephenpfohl

Followers
1K
Following
2K
Media
9
Statuses
270

Research Scientist at Google Research. Researching #fairness #transparency #causality #healthcare #healthequity

Joined March 2009
Don't wanna be here? Send us removal request.
@stephenpfohl
Stephen Pfohl
7 months
RT @jaldmn: Really great to see this published!. So, what is this all about? đź§µ. (1/.
0
11
0
@stephenpfohl
Stephen Pfohl
9 months
Also want to thank the editorial team (@NatureMedicine, @LorenzoRighett7) and reviewers (@MMccradden and anonymous reviewers) for their constructive feedback and support for the work.
0
0
5
@stephenpfohl
Stephen Pfohl
9 months
This work represents the contributions of many amazing collaborators from @GoogleAI @GoogleDeepMind @MIT @UAlberta, including @hcolelewis @thekaransinghal @DrNealResearch @dr_nyamewaa @adoubleva @weballergy @AziziShekoofeh @negar_rz @LiamGMcCoy @HardyShakerman and many more!.
1
0
8
@stephenpfohl
Stephen Pfohl
9 months
As an update for the published version of the paper, we now make available as supplementary data the set of LLM outputs and human ratings under the proposed assessment rubrics for each of the datasets studied in the work.
1
0
3
@stephenpfohl
Stephen Pfohl
9 months
For a extended summary of key takeaways, check out my prior post about the preprint version of the work:
@stephenpfohl
Stephen Pfohl
1 year
Excited to share new work on surfacing health equity-related biases in LLMs. We design rubrics covering 6 dimensions of bias, release EquityMedQA, a collection of 7 adversarial datasets, and conduct a large-scale human evaluation study with Med-PaLM 2.
Tweet media one
1
1
4
@stephenpfohl
Stephen Pfohl
9 months
We evaluate Med-PaLM 2 outputs to EquityMedQA questions using our proposed rubrics with physician, health equity expert, and consumer raters, reflecting varying types of expertise, backgrounds, and lived experiences.
1
0
4
@stephenpfohl
Stephen Pfohl
9 months
We use an iterative participatory approach to design assessment rubrics for human evaluation of LLM outputs for equity-related harms and biases and create EquityMedQA, a collection of seven adversarial datasets for medical QA enriched for equity-related content.
1
0
4
@stephenpfohl
Stephen Pfohl
9 months
Excited to announce that our paper, “A toolbox surfacing health equity harms and biases in large language models” is now published with @NatureMedicine:
@NatureMedicine
Nature Medicine
9 months
Identifying a complex panel of bias dimensions to be evaluated, a framework is proposed to assess how prone large language models are to biased reasoning, with possible consequences on equity-related harms.
1
9
58
@stephenpfohl
Stephen Pfohl
1 year
Made it to Rio for #FAccT2024
Tweet media one
3
0
33
@stephenpfohl
Stephen Pfohl
1 year
RT @zakkohane: Red teaming LLM’s w @TristanNaumann @MSFTResearch @stephenpfohl @GoogleResearch #SAIL2024 #AI detecting threats across t….
0
7
0
@stephenpfohl
Stephen Pfohl
1 year
RT @SAILhealth: As our attendees settle in the @hyattregencypr, we get ready for a deep dive into medical AI with @TristanNaumann and @step….
0
4
0
@stephenpfohl
Stephen Pfohl
1 year
Landed in Puerto Rico for #SAIL2024
Tweet media one
0
0
27
@stephenpfohl
Stephen Pfohl
1 year
RT @thekaransinghal: Excited to share our newest work! 📝 Evaluation of LLMs is hard, especially for health equity. We provide a multifacete….
0
26
0
@stephenpfohl
Stephen Pfohl
1 year
This work represents the contributions of many amazing collaborators from @GoogleAI, @GoogleDeepMind, @MIT, @UAlberta, including @hcolelewis @thekaransinghal @DrNealResearch @dr_nyamewaa @adoubleva @weballergy @AziziShekoofeh @negar_rz @LiamGMcCoy @HardyShakerman and many more!.
0
2
6
@stephenpfohl
Stephen Pfohl
1 year
Our approach is not comprehensive of all relevant modes of biases, does not allow for direct identification of the causes of harm or bias, and does not enable reasoning about downstream effects on outcomes if an LLM were to be deployed for a real-world use case and population.
1
0
3
@stephenpfohl
Stephen Pfohl
1 year
There are several limitations in scope worth mentioning. Our approach is restricted to adversarial testing to surface equity-related biases, and is complementary to other quantitative and qualitative evaluation paradigms relevant to reasoning about equity-related harms.
1
0
2
@stephenpfohl
Stephen Pfohl
1 year
Our results suggest that our approach is both more sensitive to detecting biases in model outputs and to detecting improvements across pairs of outputs. Furthermore, the use of multiple rater groups, assessment rubrics, and datasets helps to surface bias along several dimensions.
1
0
2
@stephenpfohl
Stephen Pfohl
1 year
In our empirical study, we evaluate Med-PaLM 2 outputs on EquityMedQA datasets using our proposed rubrics using physician, health equity expert, and consumer raters, reflecting varying types of expertise, backgrounds, and lived experiences.
1
0
2
@stephenpfohl
Stephen Pfohl
1 year
It also reflects a broad set of approaches to dataset creation, including manual curation of questions grounded in specific topic areas or observed model failures, as well as semi-automated approaches to generating adversarial questions with LLMs.
1
0
2