Karan Singhal Profile
Karan Singhal

@thekaransinghal

Followers
5K
Following
341
Media
34
Statuses
129

Health AI @OpenAI, Prev @GoogleAI

San Francisco, CA
Joined April 2008
Don't wanna be here? Send us removal request.
@thekaransinghal
Karan Singhal
2 months
šŸ“£ Proud to share HealthBench, an open-source benchmark from our Health AI team at OpenAI, measuring LLM performance and safety across 5000 realistic health conversations. 🧵. Unlike previous narrow benchmarks, HealthBench enables meaningful open-ended evaluation through 48,562
Tweet media one
26
76
409
@thekaransinghal
Karan Singhal
15 days
RT @MilesKWang: We found it surprising that training GPT-4o to write insecure code triggers broad misalignment, so we studied it more. We f….
0
421
0
@thekaransinghal
Karan Singhal
15 days
RT @gdb: o3 for analyzing your health data:.
0
81
0
@thekaransinghal
Karan Singhal
2 months
RT @_jasonwei: New HealthBench eval! Very excited we (@OpenAI) are investing in AI for health, a defining use case for AGI. Favorite plot i….
0
38
0
@thekaransinghal
Karan Singhal
2 months
RT @rahularoradfs: Very proud to share #HeathBench with the world, alongside @thekaransinghal @_jasonwei’s @KidsBalanced @jquinonero and ma….
0
7
0
@thekaransinghal
Karan Singhal
2 months
This work would not have been possible without the unrelenting care and hard work of many, especially our co-authors (@rahularoradfs @_jasonwei @KidsBalanced @jquinonero @alexbeutel @JoHeidecke and others below) and 262 members of our physician cohort. Those who wished to be
Tweet media one
2
2
25
@thekaransinghal
Karan Singhal
2 months
We designed HealthBench for two audiences:. - AI research community: to shape shared standards and incentivize models that benefit humanity.- Healthcare: to provide high-quality evidence, towards a better understanding of current and future use cases and limitations. We hope that.
2
1
17
@thekaransinghal
Karan Singhal
2 months
We believe health evals should be trustworthy. We measured agreement of our model-based grading against physician grading on HealthBench Consensus, and found that models matched a median physician for 6/7 areas, indicating that HealthBench scores correspond to physician judgment.
Tweet media one
1
1
12
@thekaransinghal
Karan Singhal
2 months
As a bonus, we introduce two additional members of the HealthBench family: HealthBench Hard and HealthBench Consensus, which are designed to be especially difficult and physician-validated, respectively. The top model scores just 32% on HealthBench Hard, making it a worthy target
Tweet media one
1
3
19
@thekaransinghal
Karan Singhal
2 months
Reliability is critical in healthcare–one bad response can outweigh many good ones. We measure worst-case performance at k samples across HealthBench, and find that o3 has more than twice the worst-case score at 16 samples compared to GPT-4o.
Tweet media one
1
1
17
@thekaransinghal
Karan Singhal
2 months
We compare our models to other model providers’, stratified by focus areas. o3 performs best overall but headroom remains.
Tweet media one
1
2
14
@thekaransinghal
Karan Singhal
2 months
Using HealthBench, we see that our Apr ā€˜25 models define a new frontier of performance at cost, with GPT-4.1 nano outperforming GPT-4o (Aug ā€˜24), despite being 25x cheaper. The difference b/w o3 and GPT-4o (.28) is greater than b/w GPT-4o and GPT-3.5 Turbo (.16).
Tweet media one
1
4
23
@thekaransinghal
Karan Singhal
2 months
We built HealthBench over the last year, working with 262 physicians across 26 specialties with practice experience in 60 countries (below), across selecting focus areas, generating relevant and difficult examples, annotating examples, and validating every step along the way.
Tweet media one
1
2
25
@thekaransinghal
Karan Singhal
3 months
RT @_jasonwei: New benchmark for deep research agents! An agent that is creative and persistent should be able to find any piece of informa….
0
65
0
@thekaransinghal
Karan Singhal
5 months
OpenAI's Health AI team is now hiring backend/fullstack SWEs towards our mission of universalizing access to health information!. Please apply if you:.- Can write maintainable, high-quality backend / fullstack code at high velocity.- Are willing to run through walls towards this
Tweet media one
48
95
651
@thekaransinghal
Karan Singhal
5 months
RT @Felipe_Millon: Today, we at OpenAI launched Deep Researcher and I wanted to share a deeply personal story about how amazing this tool i….
0
843
0
@thekaransinghal
Karan Singhal
6 months
ā€œCollaboration between clinicians and vision–language models in radiology report generationā€: state-of-the-art chest x-ray report generation, extensive human evaluation, and exploration of human-AI collaboration to improve clinical utility. šŸ”— Read more:
Tweet media one
0
0
4
@thekaransinghal
Karan Singhal
6 months
Two other recent Nature Medicine publications from this amazing team:. ā€œA toolbox for surfacing health equity harms and biases in large language modelsā€: a framework for studying health equity and bias in LLMs, 7 newly-released datasets, and the largest-scale empirical study in
Tweet media one
1
0
4
@thekaransinghal
Karan Singhal
6 months
(Pardon the interruption in your regularly scheduled AGI programming!). šŸ“„ The Med-PaLM 2 paper is now published in Nature Medicine! The work demonstrated that with some tuning, LLMs could not only perform well on medical exams, but could also answer consumer medical questions
Tweet media one
3
13
129
@thekaransinghal
Karan Singhal
7 months
RT @tejalpatwardhan: ChatGPT is now free in WhatsApp! I’m super excited about this, big step towards broadly accessible benefits (especiall….
0
27
0