Thomas Sounack @tsounack X Profile

Thomas Sounack

@tsounack

Followers

77

Following

38

Media

1

Statuses

26

AI/ML Engineer @ Dana-Farber Cancer Institute | Stanford alum

Joined May 2024

Don't wanna be here? Send us removal request.

Thomas Sounack

@tsounack

26 days

Very excited to share the release of BioClinical ModernBERT!. Highlights:.- biggest and most diverse biomedical and clinical dataset for an encoder.- 8192 context.- fastest throughput with a variety of inputs.- sota results across several tasks.- base and large sizes.(1/8).

4

13

65

Thomas Sounack

@tsounack

12 days

Exciting work from @neumll !.

NeuML

@neumll

12 days

🧬🔬⚕️ Building on the popularity of our PubMedBERT Embeddings model, we're excited to release a long context medical embeddings model!. It's built on the great work below from @tsounack. Model: Paper:

0

4

Thomas Sounack

@tsounack

21 days

Exciting to see BioClinical ModernBERT (base) ranked #2 among trending fill-mask models - right after BERT!. The large version is currently at #4. Grateful for the interest, and can’t wait to see what projects people apply it to!

0

7

12

Thomas Sounack

@tsounack

22 days

Github link:

0

1

6

Thomas Sounack

@tsounack

22 days

BioClinical ModernBERT github repo is online! It contains:.- Our continued pretraining config files.- Performance eval code.- Inference speed eval code. Step-by-step guide on how to continue ModernBERT or BioClinical ModernBERT pretraining coming in the next few days!.

1

3

18

Thomas Sounack

@tsounack

22 days

RT @introsp3ctor: next demo visualizing BioClinical-ModernBERT-base embeddings on a sphere….

0

1

0

Thomas Sounack

@tsounack

23 days

RT @gm8xx8: BioClinical ModernBERT: A State-of-the-Art Long-Context Encoder for Biomedical and Clinical NLP. → Built on ModernBERT with 8K….

0

4

0

Thomas Sounack

@tsounack

26 days

RT @josephpollack: we are so back . "Mitochondria is the powerhouse of the [MASK]."

0

2

0

Thomas Sounack

@tsounack

26 days

RT @joshp_davis: BioClinical ModernBERT is out!. Built on the largest, most diverse biomedical/clinical dataset to date.‼️Delivers SOTA acr….

0

2

0

Thomas Sounack

@tsounack

26 days

RT @jeremyphoward: Your daily reminder that fine tuning is just continued pretraining. Super cool results from @antoine_chaffin who is put….

0

55

0

Thomas Sounack

@tsounack

26 days

RT @antoine_chaffin: You can just continue pre-train things ✨.Happy to announce the release of BioClinical ModernBERT, a ModernBERT model w….

0

33

0

Thomas Sounack

@tsounack

26 days

RT @bclavie: Clinical encoders are joining the ModernBERT family ☺️.

0

7

0

Thomas Sounack

@tsounack

26 days

RT @LightOnIO: 🚀Announcing BioClinical ModernBERT, a SOTA encoder for healthcare AI, developed by Thomas Sounack @tsounack for Dana-Farber….

0

9

0

Thomas Sounack

@tsounack

26 days

Link to the models:.- - - (8/8).

0

9

Thomas Sounack

@tsounack

26 days

During benchmarking, we also observed substantially faster fine-tuning and inference with BioClinical ModernBERT. Combined with its long context support, enabling full clinical note processing in a single pass, it offers strong scaling potential for clinical NLP. (7/8).

1

0

6

Thomas Sounack

@tsounack

26 days

Excited to see how it performs on your data! In our internal evaluations, BioClinical ModernBERT significantly outperformed existing encoders - thanks to its training on diverse clinical data spanning multiple institutions, specialties, and countries. (6/8).

1

0

6

Thomas Sounack

@tsounack

26 days

Most clinical encoders underperform on de-identification tasks due to PHI masking in MIMIC. BioClinical ModernBERT uses datasets with realistic PHI surrogates, enabling more natural representations and stronger DEID performance. (5/8).

1

0

4

Thomas Sounack

@tsounack

26 days

Leveraging the training schedule of ModernBERT, we designed a two-step training process for continued pre-training. We release the checkpoints with our models, and in the next few days a guide for continued pre-training on BioClinical ModernBERT will be available. (4/8).

1

0

5

Thomas Sounack

@tsounack

26 days

At the @lindvalllab (with @joshp_davis and @DurieuxBrigitte), we collaborated with @antoine_chaffin from the ModernBERT team, @tompollard and @alistairewj from the MIMIC team, @lehmer16, and both @MattBMcDermott and @TristanNaumann from the Clinical BERT team. (3/8).

2

0

7

Thomas Sounack

@tsounack

26 days

Paper: Collection: (2/8).

1

2

10