Neil Tenenholtz Profile
Neil Tenenholtz

@ntenenz

Followers
960
Following
614
Media
145
Statuses
1K

Multimodal model training for biology / healthcare at MSR

Boston, MA
Joined February 2016
Don't wanna be here? Send us removal request.
@LesterMackey
Lester Mackey
30 days
If you're a PhD student interested in interning with me or one of my amazing colleagues at Microsoft Research New England (@MSRNE, @MSFTResearch) this summer, please apply here https://t.co/DIkXUuK4zc (If you'd like to work with me, please include my name in your cover letter!)
9
72
425
@ntenenz
Neil Tenenholtz
1 month
Don't like the status quo? Change it. You can just do things!
@_arohan_
rohan anil
1 month
Since folks are discussing Infra, it is not about models per say, its about agency: Two incidents that I fondly remember: covid happened and meets was slow, a senior engineer and a friend decided to take it own their hands profiling things and making it better. They did a
0
0
2
@canondetortugas
Dylan Foster 🐢
2 months
MSR NYC is hiring spring and summer interns in AI/ML/RL!
9
27
413
@avapamini
Ava Amini
2 months
Applications for @MSFTResearch undergrad research internships for rising juniors and seniors are due Monday Oct 6! Apply to work with us in BioML 👉 https://t.co/QYjsdyEpRQ w/ @KevinKaichuang, @alexijielu, @lorin_crawford, Kristen Severson, @ntenenz, @SarahAlamdari, and more!
3
22
125
@LesterMackey
Lester Mackey
2 months
I’m pretty sure this is what they designed Sora 2 for (sound on) @raazdwivedi @AShettyV
0
1
8
@canondetortugas
Dylan Foster 🐢
2 months
Microsoft Research New York City is seeking applicants for multiple Postdoctoral Researcher positions in ML/AI! These are positions for up to 2 years, starting in July 2026. Application deadline: October 22, 2025
5
46
254
@zdhnarsil
Dinghuai Zhang 张鼎怀
3 months
Blog updated! Notably, more ablation analysis compared with other importance sampling variant.
@fengyao1909
Feng Yao
3 months
We are glad that TIS and FlashRL have received broad attention from the open-source community that they have been verified and supported (OpenRLHF @hijkzzz, SkyRL @NovaSkyAI, REINFORCE++@hijkzzz, OAT @zzlccc)! A few updates on our blog and FlashRL package: (1) more in-depth
0
1
8
@ntenenz
Neil Tenenholtz
3 months
Just waiting for the rust-impl, free-threaded, easier-to-debug @astral_sh python interpreter.
@samsja19
samsja
3 months
impressed by the execution of the @astral_sh team, taking over the whole python ecosystem in couple of months and already pushing great product for entreprise it's all just about execution
0
0
1
@ntenenz
Neil Tenenholtz
4 months
For more info and links to all the resources, check out the blog post: https://t.co/L0RSqhPSbD And of course, a huge shoutout to the entire team for making this happen: @KevinKaichuang @SarahAlamdari Alex J Lee Kaeli Kaymak-Loveless @samir_char @garykbrixi @cdomingoenrich
Tweet card summary image
microsoft.com
A collection of both protein sequence data and generative models, designed to serve as a modern resource for protein biology in the age of AI.
0
0
1
@ntenenz
Neil Tenenholtz
4 months
What did we uncover? 🎉 Model scale, data scale, and data diversity all positively impact E. coli expression. 🎉 ☹️ Unfortunately, common computational metrics are poor predictors of expressibility. ☹️ To all those interested in better PLM evals... let the chase begin!
1
0
3
@ntenenz
Neil Tenenholtz
4 months
Modern LM training is a game of 🐱 and 🐭. You improve training signal (e.g., data) to overcome gaps in eval performance, only to strengthen the evals and thus discover new gaps -- starting the cycle anew. With the Dayhoff Atlas, we aim to jumpstart the same race for PLMs. We
1
7
25
@dr_alphalyrae
Vega Shah
4 months
The Dayhoff Atlas: scaling sequence diversity for improved protein generation | bioRxiv https://t.co/TaHb3mFEvY
2
9
35
@xanamini
Alexander Amini
4 months
🧬 The largest open dataset of natural proteins in the world — 3.3 billion seqs 🧠 A 3 billion param hybrid ssm+transformer model 🤗 Fully open-source data + model https://t.co/8X0VfYIQwq Congrats to @avapamini + entire team, including @LiquidAI_'s own Kaeli Kaymak-Loveless
Tweet card summary image
biorxiv.org
Modern biology is powered by the organization of biological information, a framework pioneered in 1965 by Margaret Dayhoff’s Atlas of Protein Sequence and Structure. Databases descended from this...
@avapamini
Ava Amini
4 months
thrilled to share The Dayhoff Atlas of protein language data and models 🚀 protein biology in the age of AI! https://t.co/4wP9kNRUoM we built + open source the largest natural protein dataset, w/ 3.3 billion seqs & a first-in-class dataset of structure-based synthetic proteins
2
13
87
@KevinKaichuang
Kevin K. Yang 楊凱筌
4 months
In 1965, Margaret Dayhoff published the Atlas of Protein Sequence and Structure, which collated the 65 proteins whose amino acid sequences were then known. Inspired by that Atlas, today we are releasing the Dayhoff Atlas of protein sequence data and protein language models.
6
91
297
@ntenenz
Neil Tenenholtz
4 months
The Dayhoff Atlas! Open code. Open weights. Open datasets. Thanks @huggingface for helping to facilitate open science. https://t.co/h0WRd1Wu3I @ClementDelangue @julien_c
Tweet card summary image
huggingface.co
@KevinKaichuang
Kevin K. Yang 楊凱筌
4 months
Our models, code, and data are openly available on Github, Zenodo, and Huggingface. https://t.co/6l9iStfqDE https://t.co/T9mGmf2Pgd https://t.co/6X3VPQmJ7I
1
7
23
@ntenenz
Neil Tenenholtz
4 months
To the GPU-poor grad students out there, finding a better predictor of expression is one of the highest leverage contributions you could make to PLM research. Scale isn't always all you need.
@AllThingsApx
Kyle Tretina, Ph.D.
4 months
I was surprised to see that BackboneRef boosts Dayhoff‑170 m pLM generations expressed in E. coli 27.6% → 51.7%, 1.9× with zero filtering ...while common metrics (pLDDT, perplexity) failed to predict wet‑lab outcomes (AUROC ≤ 0.57) This quietly re‑prioritizes how we
0
4
19
@peteratmsr
Peter Lee
4 months
Today in @ScienceMagazine we present BioEmu1.1 @MSFTResearch. It rapidly and accurately emulates equilibrium distributions of protein dynamics at millisecond timescales. Code and datasets available on @Azure Foundry.
@MSFTResearch
Microsoft Research
4 months
Today in the journal Science: BioEmu from Microsoft Research AI for Science. This generative deep learning method emulates protein equilibrium ensembles – key for understanding protein function at scale. https://t.co/WwKjj5B0eb
5
28
168
@FrankNoeBerlin
Frank Noe
4 months
BioEmu now published in @ScienceMagazine !! What is BioEmu? Check out this video: https://t.co/PAj96iKvR7
@MSFTResearch
Microsoft Research
4 months
Today in the journal Science: BioEmu from Microsoft Research AI for Science. This generative deep learning method emulates protein equilibrium ensembles – key for understanding protein function at scale. https://t.co/WwKjj5B0eb
15
109
409
@ntenenz
Neil Tenenholtz
4 months
@ClementDelangue @julien_c 🤗 Collections are great, but limiting Papers to arxiv-only leaves out much of the sciences.
0
0
2
@ntenenz
Neil Tenenholtz
4 months
Raising the bat signal... @ClementDelangue @julien_c
@KevinKaichuang
Kevin K. Yang 楊凱筌
4 months
Getting ready for a release, and I'm kinda sad that @huggingface papers doesn't integrate with @biorxivpreprint
3
0
4