Stephen Pfohl Profile
Stephen Pfohl

@stephenpfohl

Followers
1K
Following
2K
Media
9
Statuses
270

Research Scientist at Google Research. Researching #fairness #transparency #causality #healthcare #healthequity

Joined March 2009
Don't wanna be here? Send us removal request.
@jaldmn
Joe Alderman
1 year
Really great to see this published! So, what is this all about? 🧵 (1/
@LancetDigitalH
The Lancet Digital Health
1 year
📢 @LancetDigitalH and @NEJM_AI have co-published the long-awaited STANDING Together recommendations aiming to promote transparency in health datasets, and tackle algorithmic bias. @jaldmn @dr_laws @Denniston_Ophth @DrXiaoLiu @diversedata_ST Read more: https://t.co/2KlYciKQGm
1
11
16
@stephenpfohl
Stephen Pfohl
1 year
Also want to thank the editorial team (@NatureMedicine, @LorenzoRighett7) and reviewers (@MMccradden and anonymous reviewers) for their constructive feedback and support for the work.
0
0
5
@stephenpfohl
Stephen Pfohl
1 year
This work represents the contributions of many amazing collaborators from @GoogleAI @GoogleDeepMind @MIT @UAlberta, including @hcolelewis @thekaransinghal @DrNealResearch @dr_nyamewaa @adoubleva @weballergy @AziziShekoofeh @negar_rz @LiamGMcCoy @HardyShakerman and many more!
1
0
9
@stephenpfohl
Stephen Pfohl
1 year
As an update for the published version of the paper, we now make available as supplementary data the set of LLM outputs and human ratings under the proposed assessment rubrics for each of the datasets studied in the work.
1
0
3
@stephenpfohl
Stephen Pfohl
1 year
For a extended summary of key takeaways, check out my prior post about the preprint version of the work: https://t.co/7UUQHLdZ6d.
@stephenpfohl
Stephen Pfohl
2 years
Excited to share new work on surfacing health equity-related biases in LLMs. We design rubrics covering 6 dimensions of bias, release EquityMedQA, a collection of 7 adversarial datasets, and conduct a large-scale human evaluation study with Med-PaLM 2. https://t.co/tkAF7TdJGM
1
1
4
@stephenpfohl
Stephen Pfohl
1 year
We evaluate Med-PaLM 2 outputs to EquityMedQA questions using our proposed rubrics with physician, health equity expert, and consumer raters, reflecting varying types of expertise, backgrounds, and lived experiences.
1
0
4
@stephenpfohl
Stephen Pfohl
1 year
We use an iterative participatory approach to design assessment rubrics for human evaluation of LLM outputs for equity-related harms and biases and create EquityMedQA, a collection of seven adversarial datasets for medical QA enriched for equity-related content.
1
0
4
@stephenpfohl
Stephen Pfohl
1 year
Excited to announce that our paper, “A toolbox surfacing health equity harms and biases in large language models” is now published with @NatureMedicine: https://t.co/dm1OKAlfkV.
Tweet card summary image
nature.com
Nature Medicine - Identifying a complex panel of bias dimensions to be evaluated, a framework is proposed to assess how prone large language models are to biased reasoning, with possible...
@NatureMedicine
Nature Medicine
1 year
Identifying a complex panel of bias dimensions to be evaluated, a framework is proposed to assess how prone large language models are to biased reasoning, with possible consequences on equity-related harms.
1
9
59
@stephenpfohl
Stephen Pfohl
2 years
Made it to Rio for #FAccT2024
3
0
32
@zakkohane
Isaac Kohane
2 years
Red teaming LLM’s w @TristanNaumann @MSFTResearch @stephenpfohl @GoogleResearch #SAIL2024 #AI detecting threats across the broad taxonomy of LLM vulnerabilities
3
7
42
@SAILhealth
Symposium on AI for Learning Health Systems
2 years
As our attendees settle in the @hyattregencypr, we get ready for a deep dive into medical AI with @TristanNaumann and @stephenpfohl presenting “Red Teaming to Test Limitations of LLMs”. Welcome everyone to #SAIL24! We hope you enjoy the conference and beautiful Puerto Rico. 🏝️
0
4
10
@stephenpfohl
Stephen Pfohl
2 years
Landed in Puerto Rico for #SAIL2024
0
0
26
@thekaransinghal
Karan Singhal
2 years
Excited to share our newest work! 📝 Evaluation of LLMs is hard, especially for health equity. We provide a multifaceted human assessment framework, 7 newly-released adversarial datasets, and perform the largest human eval study on this topic to date. 🧵: https://t.co/8yBo2q9wMT
5
26
116
@stephenpfohl
Stephen Pfohl
2 years
This work represents the contributions of many amazing collaborators from @GoogleAI, @GoogleDeepMind, @MIT, @UAlberta, including @hcolelewis @thekaransinghal @DrNealResearch @dr_nyamewaa @adoubleva @weballergy @AziziShekoofeh @negar_rz @LiamGMcCoy @HardyShakerman and many more!
0
2
7
@stephenpfohl
Stephen Pfohl
2 years
Our approach is not comprehensive of all relevant modes of biases, does not allow for direct identification of the causes of harm or bias, and does not enable reasoning about downstream effects on outcomes if an LLM were to be deployed for a real-world use case and population.
1
0
3
@stephenpfohl
Stephen Pfohl
2 years
There are several limitations in scope worth mentioning. Our approach is restricted to adversarial testing to surface equity-related biases, and is complementary to other quantitative and qualitative evaluation paradigms relevant to reasoning about equity-related harms.
1
0
2
@stephenpfohl
Stephen Pfohl
2 years
Our results suggest that our approach is both more sensitive to detecting biases in model outputs and to detecting improvements across pairs of outputs. Furthermore, the use of multiple rater groups, assessment rubrics, and datasets helps to surface bias along several dimensions.
1
0
2
@stephenpfohl
Stephen Pfohl
2 years
In our empirical study, we evaluate Med-PaLM 2 outputs on EquityMedQA datasets using our proposed rubrics using physician, health equity expert, and consumer raters, reflecting varying types of expertise, backgrounds, and lived experiences.
1
0
2
@stephenpfohl
Stephen Pfohl
2 years
It also reflects a broad set of approaches to dataset creation, including manual curation of questions grounded in specific topic areas or observed model failures, as well as semi-automated approaches to generating adversarial questions with LLMs.
1
0
2