Jonathan H Chen MD PhD
@jonc101x
Followers
3K
Following
5K
Media
347
Statuses
3K
Physician Data Scientist - Stanford Center for Biomedical Informatics Research + Division of Hospital Medicine + Clinical Excellence Research Center
Joined April 2014
single human expert (p < 0.001). Our benchmark provides evidence of LMs approaching expert-level ability in validating AI-generated medical text."
0
0
0
average F1 scores from 66% to 83%. Despite strong baseline performance, MedVAL improves the best-performing proprietary LM (GPT-4o) by 8% without training on physician-labeled data, demonstrating a performance statistically non-inferior to a
1
0
0
medical tasks capturing real-world challenges. Across 10 state-of-the-art LMs spanning open-source and proprietary models, MedVAL distillation significantly improves (p < 0.001) alignment with physicians across seen and unseen tasks, increasing
1
0
0
assess whether LM-generated medical outputs are factually consistent with inputs, without requiring physician labels or reference outputs. To evaluate LM performance, we introduce MedVAL-Bench, a dataset of 840 physician-annotated outputs across 6 diverse
1
0
0
scalable evaluation, even frontier LMs can miss subtle but clinically significant errors. To address these challenges, we propose MedVAL, a novel, self-supervised, data-efficient distillation method that leverages synthetic data to train evaluator LMs to
1
0
0
However, detecting errors in LM-generated text is challenging because 1) manual review is costly and 2) expert-composed reference outputs are often unavailable in real-world settings. While the "LM-as-judge" paradigm (a LM evaluating another LM) offers
1
0
0
Abstract: "With the growing use of language models (LMs) in clinical environments, there is an immediate need to evaluate the accuracy and safety of LM-generated medical text. Currently, such evaluation relies solely on manual physician review.
1
0
0
by learning priors from corrupted data, advised by Jon Tamir and Alex Dimakis.
1
0
0
and 3) detection of underdiagnosed medical conditions using opportunistic imaging. Before joining Stanford, he completed a Master’s in Electrical and Computer Engineering at UT Austin, where he worked on improving medical image reconstruction
1
0
0
AI and expert clinician-level performance. His recent projects focus on 1) improving LLMs as expert-level evaluators of AI-generated medical text, 2) improving robustness of language model benchmarks across diverse medical tasks using prompt optimization,
1
0
0
Bio: Asad is a research staff at Stanford, advised by Akshay Chaudhari. His research broadly focuses on developing machine learning methods for healthcare applications. More concretely, he is interested in building scalable, self-supervised methods to help bridge the gap between
1
0
0
“MedVAL: Toward Expert-Level Medical Text Validation with Language Models” Asad Aali, MS. Thursday, October 30th, 2025 12:00 to 1:00 pm PST Live Stream https://t.co/pxjLh1bkgO Webinar ID: 978 8759 6012 Webinar Passcode: 420642
1
0
1
“Passion” isn’t a prerequisite. On @NEJM_AI Grand Rounds, Dr. Jonathan Chen (@jonc101x) describes growing into medicine — and why honesty about motivation helps real patients, not résumés. Hear more from Dr. Chen in the full episode: https://t.co/VIs8eOvFWE
#MedTwitter
1
3
10
and care outcomes. This talk will describe how Comet is trained across diverse health systems, what scaling reveals about generalization and medical reasoning, and how these capabilities can be applied to improve prediction, discovery, and patient outcomes
1
0
0
Abstract: "Generative models have the potential to transform how health systems learn from data. Comet, Epic’s large-scale generative medical model, is designed to represent patient histories as sequences of clinical events, enabling reasoning about disease trajectories
1
0
0
Bio: Software developer and lead of Comet team at Epic Systems.
1
0
0
In the latest episode of the @NEJM_AI Grand Rounds podcast, Dr. Jonathan Chen (@jonc101x) discusses his path from teenage programmer to @Stanford physician-informatician and why machine learning has both thrilled and unnerved him. Listen now: https://t.co/VIs8eOvFWE
1
13
34
Abstract: "The talk outlines how integrating rich clinical data with AI—especially large language models—can power “precision education” that delivers individualized, outcome-driven learning and assessment across medical training and practice."
0
1
2
develop personalized educational interventions. Jesse lives with his wife and two children in the Lower East Side of New York City.
1
1
2