Neil Tenenholtz
@ntenenz
Followers
960
Following
614
Media
145
Statuses
1K
Multimodal model training for biology / healthcare at MSR
Boston, MA
Joined February 2016
If you're a PhD student interested in interning with me or one of my amazing colleagues at Microsoft Research New England (@MSRNE, @MSFTResearch) this summer, please apply here https://t.co/DIkXUuK4zc (If you'd like to work with me, please include my name in your cover letter!)
9
72
425
Don't like the status quo? Change it. You can just do things!
Since folks are discussing Infra, it is not about models per say, its about agency: Two incidents that I fondly remember: covid happened and meets was slow, a senior engineer and a friend decided to take it own their hands profiling things and making it better. They did a
0
0
2
MSR NYC is hiring spring and summer interns in AI/ML/RL!
9
27
413
Applications for @MSFTResearch undergrad research internships for rising juniors and seniors are due Monday Oct 6! Apply to work with us in BioML 👉 https://t.co/QYjsdyEpRQ w/ @KevinKaichuang, @alexijielu, @lorin_crawford, Kristen Severson, @ntenenz, @SarahAlamdari, and more!
3
22
125
Microsoft Research New York City is seeking applicants for multiple Postdoctoral Researcher positions in ML/AI! These are positions for up to 2 years, starting in July 2026. Application deadline: October 22, 2025
5
46
254
Blog updated! Notably, more ablation analysis compared with other importance sampling variant.
We are glad that TIS and FlashRL have received broad attention from the open-source community that they have been verified and supported (OpenRLHF @hijkzzz, SkyRL @NovaSkyAI, REINFORCE++@hijkzzz, OAT @zzlccc)! A few updates on our blog and FlashRL package: (1) more in-depth
0
1
8
Just waiting for the rust-impl, free-threaded, easier-to-debug @astral_sh python interpreter.
impressed by the execution of the @astral_sh team, taking over the whole python ecosystem in couple of months and already pushing great product for entreprise it's all just about execution
0
0
1
For more info and links to all the resources, check out the blog post: https://t.co/L0RSqhPSbD And of course, a huge shoutout to the entire team for making this happen: @KevinKaichuang
@SarahAlamdari Alex J Lee Kaeli Kaymak-Loveless @samir_char
@garykbrixi
@cdomingoenrich
microsoft.com
A collection of both protein sequence data and generative models, designed to serve as a modern resource for protein biology in the age of AI.
0
0
1
What did we uncover? 🎉 Model scale, data scale, and data diversity all positively impact E. coli expression. 🎉 ☹️ Unfortunately, common computational metrics are poor predictors of expressibility. ☹️ To all those interested in better PLM evals... let the chase begin!
1
0
3
Modern LM training is a game of 🐱 and 🐭. You improve training signal (e.g., data) to overcome gaps in eval performance, only to strengthen the evals and thus discover new gaps -- starting the cycle anew. With the Dayhoff Atlas, we aim to jumpstart the same race for PLMs. We
1
7
25
The Dayhoff Atlas: scaling sequence diversity for improved protein generation | bioRxiv https://t.co/TaHb3mFEvY
2
9
35
🧬 The largest open dataset of natural proteins in the world — 3.3 billion seqs 🧠 A 3 billion param hybrid ssm+transformer model 🤗 Fully open-source data + model https://t.co/8X0VfYIQwq Congrats to @avapamini + entire team, including @LiquidAI_'s own Kaeli Kaymak-Loveless
biorxiv.org
Modern biology is powered by the organization of biological information, a framework pioneered in 1965 by Margaret Dayhoff’s Atlas of Protein Sequence and Structure. Databases descended from this...
thrilled to share The Dayhoff Atlas of protein language data and models 🚀 protein biology in the age of AI! https://t.co/4wP9kNRUoM we built + open source the largest natural protein dataset, w/ 3.3 billion seqs & a first-in-class dataset of structure-based synthetic proteins
2
13
87
In 1965, Margaret Dayhoff published the Atlas of Protein Sequence and Structure, which collated the 65 proteins whose amino acid sequences were then known. Inspired by that Atlas, today we are releasing the Dayhoff Atlas of protein sequence data and protein language models.
6
91
297
The Dayhoff Atlas! Open code. Open weights. Open datasets. Thanks @huggingface for helping to facilitate open science. https://t.co/h0WRd1Wu3I
@ClementDelangue @julien_c
huggingface.co
Our models, code, and data are openly available on Github, Zenodo, and Huggingface. https://t.co/6l9iStfqDE
https://t.co/T9mGmf2Pgd
https://t.co/6X3VPQmJ7I
1
7
23
To the GPU-poor grad students out there, finding a better predictor of expression is one of the highest leverage contributions you could make to PLM research. Scale isn't always all you need.
I was surprised to see that BackboneRef boosts Dayhoff‑170 m pLM generations expressed in E. coli 27.6% → 51.7%, 1.9× with zero filtering ...while common metrics (pLDDT, perplexity) failed to predict wet‑lab outcomes (AUROC ≤ 0.57) This quietly re‑prioritizes how we
0
4
19
Today in @ScienceMagazine we present BioEmu1.1 @MSFTResearch. It rapidly and accurately emulates equilibrium distributions of protein dynamics at millisecond timescales. Code and datasets available on @Azure Foundry.
Today in the journal Science: BioEmu from Microsoft Research AI for Science. This generative deep learning method emulates protein equilibrium ensembles – key for understanding protein function at scale. https://t.co/WwKjj5B0eb
5
28
168
BioEmu now published in @ScienceMagazine !! What is BioEmu? Check out this video: https://t.co/PAj96iKvR7
Today in the journal Science: BioEmu from Microsoft Research AI for Science. This generative deep learning method emulates protein equilibrium ensembles – key for understanding protein function at scale. https://t.co/WwKjj5B0eb
15
109
409
@ClementDelangue @julien_c 🤗 Collections are great, but limiting Papers to arxiv-only leaves out much of the sciences.
0
0
2
Raising the bat signal... @ClementDelangue @julien_c
Getting ready for a release, and I'm kinda sad that @huggingface papers doesn't integrate with @biorxivpreprint
3
0
4