Jonathan H Chen MD PhD @jonc101x X Profile

Jonathan H Chen MD PhD

@jonc101x

Followers

3K

Following

5K

Media

347

Statuses

3K

Physician Data Scientist - Stanford Center for Biomedical Informatics Research + Division of Hospital Medicine + Clinical Excellence Research Center

https://t.co/qxxBVhvot9

Joined April 2014

Don't wanna be here? Send us removal request.

Jonathan H Chen MD PhD

@jonc101x

12 hours

single human expert (p < 0.001). Our benchmark provides evidence of LMs approaching expert-level ability in validating AI-generated medical text."

0

Jonathan H Chen MD PhD

@jonc101x

12 hours

average F1 scores from 66% to 83%. Despite strong baseline performance, MedVAL improves the best-performing proprietary LM (GPT-4o) by 8% without training on physician-labeled data, demonstrating a performance statistically non-inferior to a

1

0

Jonathan H Chen MD PhD

@jonc101x

12 hours

medical tasks capturing real-world challenges. Across 10 state-of-the-art LMs spanning open-source and proprietary models, MedVAL distillation significantly improves (p < 0.001) alignment with physicians across seen and unseen tasks, increasing

1

0

Jonathan H Chen MD PhD

@jonc101x

12 hours

assess whether LM-generated medical outputs are factually consistent with inputs, without requiring physician labels or reference outputs. To evaluate LM performance, we introduce MedVAL-Bench, a dataset of 840 physician-annotated outputs across 6 diverse

1

0

Jonathan H Chen MD PhD

@jonc101x

12 hours

scalable evaluation, even frontier LMs can miss subtle but clinically significant errors. To address these challenges, we propose MedVAL, a novel, self-supervised, data-efficient distillation method that leverages synthetic data to train evaluator LMs to

1

0

Jonathan H Chen MD PhD

@jonc101x

12 hours

However, detecting errors in LM-generated text is challenging because 1) manual review is costly and 2) expert-composed reference outputs are often unavailable in real-world settings. While the "LM-as-judge" paradigm (a LM evaluating another LM) offers

1

0

Jonathan H Chen MD PhD

@jonc101x

12 hours

Abstract: "With the growing use of language models (LMs) in clinical environments, there is an immediate need to evaluate the accuracy and safety of LM-generated medical text. Currently, such evaluation relies solely on manual physician review.

1

0

Jonathan H Chen MD PhD

@jonc101x

12 hours

by learning priors from corrupted data, advised by Jon Tamir and Alex Dimakis.

1

0

Jonathan H Chen MD PhD

@jonc101x

12 hours

and 3) detection of underdiagnosed medical conditions using opportunistic imaging. Before joining Stanford, he completed a Master’s in Electrical and Computer Engineering at UT Austin, where he worked on improving medical image reconstruction

1

0

Jonathan H Chen MD PhD

@jonc101x

12 hours

AI and expert clinician-level performance. His recent projects focus on 1) improving LLMs as expert-level evaluators of AI-generated medical text, 2) improving robustness of language model benchmarks across diverse medical tasks using prompt optimization,

1

0

Jonathan H Chen MD PhD

@jonc101x

12 hours

Bio: Asad is a research staff at Stanford, advised by Akshay Chaudhari. His research broadly focuses on developing machine learning methods for healthcare applications. More concretely, he is interested in building scalable, self-supervised methods to help bridge the gap between

1

0

Jonathan H Chen MD PhD

@jonc101x

12 hours

“MedVAL: Toward Expert-Level Medical Text Validation with Language Models” Asad Aali, MS. Thursday, October 30th, 2025 12:00 to 1:00 pm PST Live Stream https://t.co/pxjLh1bkgO Webinar ID: 978 8759 6012 Webinar Passcode: 420642

1

0

1

NEJM AI

@NEJM_AI

6 days

“Passion” isn’t a prerequisite. On @NEJM_AI Grand Rounds, Dr. Jonathan Chen (@jonc101x) describes growing into medicine — and why honesty about motivation helps real patients, not résumés. Hear more from Dr. Chen in the full episode: https://t.co/VIs8eOvFWE #MedTwitter

1

3

10

Jonathan H Chen MD PhD

@jonc101x

7 days

in real-world settings."

0

1

Jonathan H Chen MD PhD

@jonc101x

7 days

and care outcomes. This talk will describe how Comet is trained across diverse health systems, what scaling reveals about generalization and medical reasoning, and how these capabilities can be applied to improve prediction, discovery, and patient outcomes

1

0

Jonathan H Chen MD PhD

@jonc101x

7 days

Abstract: "Generative models have the potential to transform how health systems learn from data. Comet, Epic’s large-scale generative medical model, is designed to represent patient histories as sequences of clinical events, enabling reasoning about disease trajectories

1

0

Jonathan H Chen MD PhD

@jonc101x

7 days

Bio: Software developer and lead of Comet team at Epic Systems.

1

0

NEJM AI

@NEJM_AI

13 days

In the latest episode of the @NEJM_AI Grand Rounds podcast, Dr. Jonathan Chen (@jonc101x) discusses his path from teenage programmer to @Stanford physician-informatician and why machine learning has both thrilled and unnerved him. Listen now: https://t.co/VIs8eOvFWE

1

13

34

Jonathan H Chen MD PhD

@jonc101x

17 days

Abstract: "The talk outlines how integrating rich clinical data with AI—especially large language models—can power “precision education” that delivers individualized, outcome-driven learning and assessment across medical training and practice."

0

1

2

Jonathan H Chen MD PhD

@jonc101x

17 days

develop personalized educational interventions. Jesse lives with his wife and two children in the Lower East Side of New York City.

1

2