dbittermanmd Profile Banner
Danielle Bitterman, MD Profile
Danielle Bitterman, MD

@dbittermanmd

Followers
1K
Following
1K
Media
32
Statuses
413

I'm a physician-scientist working in NLP and clinical AI evaluation. You'll find me in the lab or the rad onc clinic @ BWH | DFCI | Harvard Medical School

Boston, MA
Joined March 2017
Don't wanna be here? Send us removal request.
@dbittermanmd
Danielle Bitterman, MD
1 year
🩺💡The Bitterman lab has spend much of the past year researching #LLMs for healthcare. This post summarizes our inroads into making LLMs safer and reliable for clinicians and patients: https://t.co/tr2AIFNFti. We'll be at #EMNLP2024 - come chat if you have similar interests!
Tweet card summary image
huggingface.co
2
14
75
@Jirui_Qi
Jirui Qi @EMNLP25 ✈️
8 days
[1/2] Heading to #EMNLP2025 to present our work on multilingual reasoning. (Fri Nov 7, 12:30-13:30) We analyze the trade-off between controlling reasoning languages and accuracy. We also explore mitigations like prompt hack, post-train (and GRPO🤩) for this issue. Come say hi!
1
2
6
@shan23chen
Shan Chen
6 days
Reasoning models do not think in user's query langauge, our work will be presented by @Jirui_Qi at #EMNLP2025! Now we dive a bit more into the potential solution! We set a goal: to make models reason in the user’s language without losing accuracy. https://t.co/lucSsbYtv0
Tweet card summary image
huggingface.co
1
6
14
@shan23chen
Shan Chen
6 days
1) LLMs know a ton—but do they use it wisely? The “Physics of LMs” series says from great @ZeyuanAllenZhu : storing facts ≠ manipulating them. Our npj Digital Medicine paper shows the cost: when prompts are illogical, models can still confidently generate false medical info.
1
2
1
@shan23chen
Shan Chen
6 days
It addresses the importance of aligning models, especially on the last mile tasks. This paper was done before the o1 era, but reasoning and align the model deliberately is def one strong boost towards stronger models! Our work also got on @nytimes! https://t.co/uoX1BIMU3i
Tweet card summary image
nytimes.com
Experts weigh in on the benefits and harms.
1
1
1
@EladSharonMD
Elad Sharon
7 days
Excited to be speaking on November 4th @CancerResrch annual meeting (#FriendsAM25)! I will be speaking on #ContributionOfEffect (#COE) You can still register online: https://t.co/96BjxD2Hia
1
6
17
@npjDigitalMed
npj Digital Medicine
19 days
Researchers found GPT-4 and Llama 3 often comply with false or unsafe prompts — even when they “know” the correct answer. This “AI sycophancy” reveals a deeper risk: when models prioritize politeness and compliance over clinical accuracy, patient safety is on the line.
@MGBResearchNews
Mass General Brigham Research
19 days
A new study from @aim_harvard and colleagues found that LLMs prioritize helpfulness over accuracy in medical contexts. The study was published in @npjDigitalMed. Read more: https://t.co/WZRIhNfZWo https://t.co/W5kwci3AHA @HugoAerts @dbittermanmd
2
7
19
@Gabe__MD
Gabe Wilson MD
19 days
@jackgallifant @dbittermanmd - Excellent Editorial in NEJM AI We absolutely DO need Humanity’s Next Medical Exam to evaluate AI. EXCELLENT idea. And we need to test it on the latest most advanced models like GPT5-Pro, Grok4-Heavy My opinion: fewer than 0.1% of physicians in
Tweet card summary image
ai.nejm.org
The rapid advances in health care AI necessitate a fundamental shift in how we evaluate these systems. Palepu et al. (2025) demonstrate that AI can outperform medical trainees in breast cancer mana...
0
2
2
@MGBResearchNews
Mass General Brigham Research
19 days
A new study from @aim_harvard and colleagues found that LLMs prioritize helpfulness over accuracy in medical contexts. The study was published in @npjDigitalMed. Read more: https://t.co/WZRIhNfZWo https://t.co/W5kwci3AHA @HugoAerts @dbittermanmd
0
2
6
@dbittermanmd
Danielle Bitterman, MD
24 days
Thanks for highlighting! The good news is that @shan23chen's recipe that improved LLM safety is accessible for health systems and small academic labs like ours! We used 2 × A100 80GB and can be done in <1hr for our open source model. Estimated < $10 if using cloud GPU renting.
@rohanpaul_ai
Rohan Paul
25 days
💊 Not a very good news for Medical LLMs. A new Mass General Brigham study shows leading LLMs often try to please the user in medical chats, and to do that, can output wrong advice. Paper shows that default models will confidently echo bad medical assumptions, and that a small
0
1
16
@dbittermanmd
Danielle Bitterman, MD
24 days
LLMs tend to prioritize helpfulness > reason. We show that safety-aware, compute-efficient fine-tuning helps models reason more critically in healthcare domain, and generalizes to improved safety alignment across other domains. https://t.co/LHVFPfTbfF
1
4
21
@npjDigitalMed
npj Digital Medicine
25 days
🚨 New @npjDigitalMed study: Large language models (LLMs) trained to be “helpful” can actually spread dangerous medical misinformation. Researchers found that models like GPT-4 and Llama3 often comply with false requests—even when they know better. https://t.co/gZERHDmnlJ
Tweet card summary image
nature.com
npj Digital Medicine - When helpfulness backfires: LLMs and the risk of false medical information due to sycophantic behavior
2
13
37
@RKouzyMD
Ramez Kouzy, MD
29 days
We had a great time chatting with @CohenProf on our debut episode of @AIMDPodcast to discuss his recent @NEJM paper and more! @leah_minnie and I are hoping you join us on our learning journey about how AI can impact us and our profession IRL.
@AIMDPodcast
AIMDPodcast
29 days
🤖 Is medical AI turning doctors into “quantified workers”? 🩺 Harvard’s Glenn Cohen joins #AI_MD to unpack clinical surveillance, physician autonomy, and the future of medicine. 🎧 Listen now: [ https://t.co/7fcaylJXXy] #AIinHealthcare #MedTech #Bioethics
1
4
10
@shan23chen
Shan Chen
3 months
Accept as main paper at EMNLP25! Also going to present this friday at NEMI! See you then!
@shan23chen
Shan Chen
9 months
More SAE papers coming! We dived deeper, looking into what is the best way to gather the SAE features for downstream classifications and also what are the potential benefits 🧐.
1
2
16
@Jirui_Qi
Jirui Qi @EMNLP25 ✈️
3 months
Our paper on multilingual reasoning is accepted to Findings of #EMNLP2025! 🎉 (OA: 3/3/3.5/4) We show SOTA LMs struggle with reasoning in non-English languages; prompt-hack & post-training improve alignment but trade off accuracy. 📄 https://t.co/SwCR6CYpdA See you in Suzhou!
Tweet card summary image
arxiv.org
Recent Large Reasoning Models (LRMs) with thinking traces have shown strong performance on English reasoning tasks. However, their ability to think in other languages is less studied. This...
@Jirui_Qi
Jirui Qi @EMNLP25 ✈️
6 months
[1/]💡New Paper Large reasoning models (LRMs) are strong in English — but how well do they reason in your language? Our latest work uncovers their limitation and a clear trade-off: Controlling Thinking Trace Language Comes at the Cost of Accuracy 📄Link: https://t.co/SwCR6CYpdA
3
9
53
@RKouzyMD
Ramez Kouzy, MD
3 months
In this new piece just published in @LancetDigitalH we argue why we have one shot at trust with medical AI, and how we’re at risk of blowing it due to poor communication and study design. @dbittermanmd @julian_hong 1/3 https://t.co/lRwn3gcPzX
5
14
45
@UofC_CGH
UChicago CGH
4 months
Join Cancer AI Conversation on July 22, 2025. Drs. @pearsekeane (@ucl ) & @dbittermanmd (@harvardmed ) will discuss the use of synthetic data for privacy-preserving AI in a webinar moderated by Dr. @hjhanson (@ORNL ). Registration: https://t.co/rnkFOzYQ3o
1
2
6
@dbittermanmd
Danielle Bitterman, MD
4 months
Are you driven to use AI to transform patient outcomes in oncology? My lab in the AI in Medicine Program (Mass General Brigham, Harvard Medical School) is seeking Postdoc Fellows to pioneer applications of AI—especially LLMs—in cancer care. More here:
Tweet card summary image
linkedin.com
🚀 Join Us at the Forefront of AI & Cancer Care Are you driven to use cutting-edge AI to transform patient outcomes in oncology? My lab within the AI in Medicine Program (Mass General Brigham,...
0
6
29
@dbittermanmd
Danielle Bitterman, MD
6 months
Does your LRM reason in your language? Check out new preprint led by ✨ @Jirui_Qi & @shan23chen. Implications for safety/human oversight & accuracy!
@Jirui_Qi
Jirui Qi @EMNLP25 ✈️
6 months
[1/]💡New Paper Large reasoning models (LRMs) are strong in English — but how well do they reason in your language? Our latest work uncovers their limitation and a clear trade-off: Controlling Thinking Trace Language Comes at the Cost of Accuracy 📄Link: https://t.co/SwCR6CYpdA
0
1
6