jonc101x Profile Banner
Jonathan H Chen MD PhD Profile
Jonathan H Chen MD PhD

@jonc101x

Followers
3K
Following
5K
Media
347
Statuses
3K

Physician Data Scientist - Stanford Center for Biomedical Informatics Research + Division of Hospital Medicine + Clinical Excellence Research Center

Joined April 2014
Don't wanna be here? Send us removal request.
@jonc101x
Jonathan H Chen MD PhD
12 hours
single human expert (p < 0.001). Our benchmark provides evidence of LMs approaching expert-level ability in validating AI-generated medical text."
0
0
0
@jonc101x
Jonathan H Chen MD PhD
12 hours
average F1 scores from 66% to 83%. Despite strong baseline performance, MedVAL improves the best-performing proprietary LM (GPT-4o) by 8% without training on physician-labeled data, demonstrating a performance statistically non-inferior to a
1
0
0
@jonc101x
Jonathan H Chen MD PhD
12 hours
medical tasks capturing real-world challenges. Across 10 state-of-the-art LMs spanning open-source and proprietary models, MedVAL distillation significantly improves (p < 0.001) alignment with physicians across seen and unseen tasks, increasing
1
0
0
@jonc101x
Jonathan H Chen MD PhD
12 hours
assess whether LM-generated medical outputs are factually consistent with inputs, without requiring physician labels or reference outputs. To evaluate LM performance, we introduce MedVAL-Bench, a dataset of 840 physician-annotated outputs across 6 diverse
1
0
0
@jonc101x
Jonathan H Chen MD PhD
12 hours
scalable evaluation, even frontier LMs can miss subtle but clinically significant errors. To address these challenges, we propose MedVAL, a novel, self-supervised, data-efficient distillation method that leverages synthetic data to train evaluator LMs to
1
0
0
@jonc101x
Jonathan H Chen MD PhD
12 hours
However, detecting errors in LM-generated text is challenging because 1) manual review is costly and 2) expert-composed reference outputs are often unavailable in real-world settings. While the "LM-as-judge" paradigm (a LM evaluating another LM) offers
1
0
0
@jonc101x
Jonathan H Chen MD PhD
12 hours
Abstract: "With the growing use of language models (LMs) in clinical environments, there is an immediate need to evaluate the accuracy and safety of LM-generated medical text. Currently, such evaluation relies solely on manual physician review.
1
0
0
@jonc101x
Jonathan H Chen MD PhD
12 hours
by learning priors from corrupted data, advised by Jon Tamir and Alex Dimakis.
1
0
0
@jonc101x
Jonathan H Chen MD PhD
12 hours
and 3) detection of underdiagnosed medical conditions using opportunistic imaging. Before joining Stanford, he completed a Master’s in Electrical and Computer Engineering at UT Austin, where he worked on improving medical image reconstruction
1
0
0
@jonc101x
Jonathan H Chen MD PhD
12 hours
AI and expert clinician-level performance. His recent projects focus on 1) improving LLMs as expert-level evaluators of AI-generated medical text, 2) improving robustness of language model benchmarks across diverse medical tasks using prompt optimization,
1
0
0
@jonc101x
Jonathan H Chen MD PhD
12 hours
Bio: Asad is a research staff at Stanford, advised by Akshay Chaudhari. His research broadly focuses on developing machine learning methods for healthcare applications. More concretely, he is interested in building scalable, self-supervised methods to help bridge the gap between
1
0
0
@jonc101x
Jonathan H Chen MD PhD
12 hours
“MedVAL: Toward Expert-Level Medical Text Validation with Language Models” Asad Aali, MS. Thursday, October 30th, 2025 12:00 to 1:00 pm PST Live Stream https://t.co/pxjLh1bkgO Webinar ID: 978 8759 6012 Webinar Passcode: 420642
1
0
1
@NEJM_AI
NEJM AI
6 days
“Passion” isn’t a prerequisite. On @NEJM_AI Grand Rounds, Dr. Jonathan Chen (@jonc101x) describes growing into medicine — and why honesty about motivation helps real patients, not résumés. Hear more from Dr. Chen in the full episode: https://t.co/VIs8eOvFWE #MedTwitter
1
3
10
@jonc101x
Jonathan H Chen MD PhD
7 days
in real-world settings."
0
0
1
@jonc101x
Jonathan H Chen MD PhD
7 days
and care outcomes. This talk will describe how Comet is trained across diverse health systems, what scaling reveals about generalization and medical reasoning, and how these capabilities can be applied to improve prediction, discovery, and patient outcomes
1
0
0
@jonc101x
Jonathan H Chen MD PhD
7 days
Abstract: "Generative models have the potential to transform how health systems learn from data. Comet, Epic’s large-scale generative medical model, is designed to represent patient histories as sequences of clinical events, enabling reasoning about disease trajectories
1
0
0
@jonc101x
Jonathan H Chen MD PhD
7 days
Bio: Software developer and lead of Comet team at Epic Systems.
1
0
0
@NEJM_AI
NEJM AI
13 days
In the latest episode of the @NEJM_AI Grand Rounds podcast, Dr. Jonathan Chen (@jonc101x) discusses his path from teenage programmer to @Stanford physician-informatician and why machine learning has both thrilled and unnerved him. Listen now: https://t.co/VIs8eOvFWE
1
13
34
@jonc101x
Jonathan H Chen MD PhD
17 days
Abstract: "The talk outlines how integrating rich clinical data with AI—especially large language models—can power “precision education” that delivers individualized, outcome-driven learning and assessment across medical training and practice."
0
1
2
@jonc101x
Jonathan H Chen MD PhD
17 days
develop personalized educational interventions. Jesse lives with his wife and two children in the Lower East Side of New York City.
1
1
2