mkoretsky1 Profile Banner
Mathew Koretsky Profile
Mathew Koretsky

@mkoretsky1

Followers
27
Following
26
Media
0
Statuses
18

machine learning engineer | @DataTecnica | @NIH | @BiomedArena | @uvmvermont | views/tweets are my own

Washington, DC
Joined December 2022
Don't wanna be here? Send us removal request.
@mkoretsky1
Mathew Koretsky
1 month
Stay tuned to see how our Agents stack up against the latest base LLMs on biomedical question-answering tasks!
0
0
0
@mkoretsky1
Mathew Koretsky
1 month
👀 Our new Knowledge Agents make https://t.co/fknNWdWJ4k more powerful than ever. 🧬 Get insights backed by the journals and databases that biomedical researchers use daily. 📄 Read the blog more more info:
datatecnica.com
We’re excited to announce the latest update to BiomedArena.AI , the world’s first open, live platform for benchmarking LLMs on biomedical research tasks. This update deepens the platform’s ability to...
@FarazFaghri
Faraz Faghri
1 month
🧬 New at https://t.co/O7B5jBk1Jm: smarter Biomedical Knowledge Agents + Knowledge Mode We just shipped the latest update to https://t.co/O7B5jBk1Jm, the world’s first platform for benchmarking LLMs on biomedical research tasks.
1
0
0
@mkoretsky1
Mathew Koretsky
2 months
We continue to evaluate these new models on our benchmark, CARDBiomedBench. Despite significant progress, there are still no models that balance response accuracy and safety on biomedical questions 👀
@BiomedArena
BiomedArena.AI
2 months
We evaluated 12 top models using CARDBiomedBench, a biomedical benchmark with 68K+ expert QA pairs across GWAS, SMR, drug discovery & more. 🧠 No model aced both safety and accuracy. 🤖 GPT-4o = bold but risky 🤔 Claude-4.0 = cautious but wrong More is coming soon.
0
0
0
@mkoretsky1
Mathew Koretsky
2 months
Check out the latest models in @BiomedArena for all of your biomedical research questions!
@BiomedArena
BiomedArena.AI
2 months
🚀 New LLMs now LIVE on BiomedArena 🧬 Test GPT-5, Claude-4.1, Gemini 2.5 and more, on your toughest biomedical queries. All free. All benchmarked. https://t.co/kzNqodlHuk 📉 Can AI be accurate and safe in biomedicine? See the surprising results 👇🧵
0
0
1
@mkoretsky1
Mathew Koretsky
4 months
Super proud of all the hard work from our team including @tanaynayak99, @owenbianchi_, Shayan Shahand, @DanielKhashabi, and @FarazFaghri!!!
0
2
2
@mkoretsky1
Mathew Koretsky
4 months
🚨BiomedArena is live🚨 In a partnership with @lmarena_ai, our team at @DataTecnica has released a feedback-rich platform to evaluate LLM performance on real-world biomedical questions. ⚔️Access the arena: https://t.co/LjPiINtIfi 📄Read the blog post:
Tweet card summary image
news.lmarena.ai
We’re honored to partner with the team at DataTecnica to advance the expansion of BiomedArena.ai: a new domain-specific evaluation track.
@arena
lmarena.ai
4 months
🧬 BiomedArena is here! We’re honored to partner with @DataTecnica and @NIH CARD, who developed BiomedArena to evaluate LLMs for biomedical discovery, and to help expand this domain-specific track in community-driven evaluations. 🧪 Biomedical science is complex, high-stakes,
1
3
8
@DanielKhashabi
Daniel Khashabi 🕊️
7 months
🚨New LLM benchmark🚨 We're releasing BiomedSQL🔬 for tabular reasoning over large-scale biomedical databases. This includes questions based on implicit scientific conventions—like statistical thresholds, effect direction, and drug approval status. 📄 Preprint:
0
8
15
@mkoretsky1
Mathew Koretsky
7 months
📄Read the preprint: https://t.co/hl9BPIxT5a 📊Dataset: https://t.co/y6MPciJSwb 💻Code: https://t.co/Qlbl33E0RL Thanks to my teammates at NIH/CARD and @DataTecnica including Maya Willey, Adi Asija, @owenbianchi, Chelsea Alvarado, @mike_nalls, @DanielKhashabi, and @FarazFaghri
0
1
1
@mkoretsky1
Mathew Koretsky
7 months
We believe this benchmark is a critical step towards building trustworthy text-to-SQL systems that can increase efficiency of lookups for PIs and SMEs, democratize access to biomedical knowledge, and accelerate discovery
1
0
0
@mkoretsky1
Mathew Koretsky
7 months
Top performers: 1) GPT-o3-mini is the top frontier model with an accuracy of 59% on BiomedSQL, significantly trailing expert-level performance (90%) 2) Our custom text-to-SQL system, BMSQL, improves execution accuracy by ~3% and response quality by ~7% over “vanilla” GPT-o3-mini
1
0
0
@mkoretsky1
Mathew Koretsky
7 months
Biomedical researchers increasingly rely on large-scale databases that store electronic health records, population-scale studies, and clinical trial information. Our team wanted to test LLMs ability to generate valid SQL queries from questions that domain experts routinely ask
1
0
0
@mkoretsky1
Mathew Koretsky
7 months
Can LLMs perform reliably as biomedical data analysts? TL;DR: We created the first benchmark designed to challenge LLMs ability to apply scientific reasoning in text-to-SQL generation over biomedical databases, revealing a 30-40% gap between SOTA models and expert performance
1
1
1
@DanielKhashabi
Daniel Khashabi 🕊️
7 months
Long-form inputs (e.g., needle-in-haystack setups) are the crucial aspect of high-impact LLM applications. While previous studies have flagged issues like positional bias and distracting documents, they've missed a crucial element: the size of the gold/relevant context. In our
3
21
52
@DataTecnica
DataTecnica
1 year
Great work from many of our teammates! Let's accelerate data harmonization!
@medrxivpreprint
medRxiv
1 year
A new AI-assisted data standard accelerates interoperability in biomedical research https://t.co/gdGOskf9wA #medRxiv
0
1
2
@biorxiv_genomic
bioRxiv Genomics
2 years
GenoTools: An Open-Source Python Package for Efficient Genotype Data Quality Control and Analysis https://t.co/AkpxXvHFFn #biorxiv_genomic
0
2
3
@Brain1878
Brain
3 years
Koretsky et al. use genome-wide data to cluster patients based on genetic status across risk variants for five neurodegenerative disorders. The results suggest that neurodegenerative diseases have more overlapping genetic aetiology than previously assumed. https://t.co/TD3fv3mdFx
0
9
28
@ScienceofPD
The Science of Parkinson's
3 years
Genetic risk factor clustering within and across neurodegenerative conditions (such as #Parkinsons, #Alzheimers, #ALS) - from @HamptonLLeonard & colleagues; "Neurodegenerative diseases have more overlapping genetic etiology than previously expected" https://t.co/BXdQ4FUCKi
0
3
7
@mike_nalls
Mike A. Nalls
3 years
Been a great day for publications from the team. @mkoretsky1 and the team great job on showing genetic overlap and in some instances a depletion of genetic risk across and within neurodegenerative diseases.
@ScienceofPD
The Science of Parkinson's
3 years
Genetic risk factor clustering within and across neurodegenerative conditions (such as #Parkinsons, #Alzheimers, #ALS) - from @HamptonLLeonard & colleagues; "Neurodegenerative diseases have more overlapping genetic etiology than previously expected" https://t.co/BXdQ4FUCKi
1
6
13