I'm going to keep saying this until someone proves me otherwise: AlphaFold(2, 3, 10000, whatever) or any structure prediction tool will not generalize to protein conformations if its training is optimized to "ground truth" structures in the PDB! Crystallized or cryo-EM structures
Current protein models (ESM-2, AlpaFold2,...) only encode the 20 wild-type amino acids -- what about PTMs, which significantly influence the diversity of the proteome? 💁♂️To solve this, we present the first PTM-aware protein language model, PTM-Mamba!
Had a day to reflect on the release of ESM3, and just wanted to share a few thoughts (and a few shameless highlights of my lab's work! 😅). Before that, for the people who know our stuff, you know that I am an ESM evangelist: I think pLMs will be the future of protein design. 🪄
AlphaFold3 is out -- and it's all-atom, diffusion-based, with no equivariance constraints! 🧬 For biologists, now you can model PTMs, ions, DNA/RNA, small molecules -- but only 10 jobs/day. Unfortunately for us in the TechBio community, no code (though a lot of replication
What if you could design binders to specific spots/motifs on target proteins: conserved epitopes on pandemic viral phosphoproteins, disordered regions on dysregulated transcription factors, or even breakpoints on fusion oncoproteins? That would unlock unprecedented specificity
Hi everyone! Excited to share that I will join the amazing faculty at
@DukeU
as an Assistant Professor of Biomedical Engineering in July! We will design therapeutic proteins for genome editing, proteome editing, and ovarian cell engineering! 💻→🧬→🧫→🏥
Super excited to share our new protein language model: FusOn-pLM, developed by my wonderful first-year PhD student
@SophieVincoff
and her team of amazing undergrads!! 🎉
Sure, we know AlphaFold3/RFDiffusion and other structure prediction/design methods are great -- but what
Ahh, I know I've ignited a big sequence vs. structure debate (unintentionally, I promise -- I'm very agreeable!) 🔥 But just wanted to share the good news that the
@NIH
@NIGMS
recently awarded my lab the R35 MIRA Award to develop isoform-specific therapeutics to undruggable,
Really interesting paper: ESM-all atom (ESM-AA) to compete with structure-focused Rosetta-AA and AlphaFold3! ✨ ESM-AA uses multi-scale code-switching to model molecules at residue and atom scales, using multi-scale position encoding to capture contexts in both regimes. It seems
With the hype around ESM3, I just want to show how impactful pLMs can be, even beyond biomedicine! 🌏 I'm really excited to present MetaLATTE, our brand new metal binding predictor trained on ESM-2 latent embeddings via a super unique multi-task learning strategy, now live on
#NeurIPS2023
was so much fun! My favorite talk from the workshop has to be from
@PrescientDesign
! Similar to
@OpenAI
's Consistency Model formalism, they sample a score-based data manifold with one step denoising for antibody design!
#GenerativeAI
Been working on a lot of our lab's manuscripts lately (stay-tuned), but just wanted to share one of my favorite papers I've been reading: microenvironment-aware hierarchical prompt learning to predict ΔΔG changes in PPIs upon mutation! 🤩
We've seen a lot of ΔΔG prediction
An exciting
@biorxivpreprint
update on our recent PepPrCLIP model!🌶️📎 Hopefully, you all remember PepPrCLIP, where we apply Gaussian perturbations to the peptidic latent space of ESM-2 to de novo generate naturalistic peptides, and then input these new peptide sequences into a
This is my favorite cell embedding paper! SATURN learns universal cell embeddings by coupling gene expression with pLM embeddings for cross-species integration of functionally-related genes (macrogenes). 🌟 Amazing for atlas integration!
Paper:
Code:
Our new preprint is up -- and it's completely de novo! 🤩Our Peptide Prioritization via CLIP (PepPrCLIP) model is the first de novo binder design algorithm only needing the target amino acid sequence (no 3D structure needed). 🌶️📎Take a read here!
Okay, this is SO COOL from
@Mila_Quebec
: Aaren, which leverages a parallel prefix scan algorithm, can be trained in parallel (Transformers) but only requires constant memory (RNNs)! 🤩 It's kind of similar to RWKV and Linear Attention, but unlike those methods, Aaren exactly
Excited to share our newest Cas9 out in
@NatureComms
! 🥳 We recombine our Sc++ enzyme (NNG) with the PID of
@BKleinstiver
's SpRY (NRN) to generate a PAM-flexible chimeric Cas9 = SpRYc! 🌶️ SpRYc can edit at diverse NNN loci -- take it out for a spin! 🧫
We've updated PepMLM with new comparisons to RFDiffusion! 🌟Without using structure, PepMLM has a higher in silico hit-rate for peptide generation on structured targets (those with existing peptide binders) than RFDiffusion. More in vitro data coming soon!
Exciting news from the lab! 👀 We have updated PepMLM on the
@arxiv
with new computational and experimental benchmarking results! 🥳
Just as a reminder: we trained PepMLM via a novel span masking strategy that positions peptide binder sequences at the C-terminus of their target
🧵 (1/4) As a brand-new PI, I'm psyched to present my group's first preprint! Here, we apply
@OpenAI
's CLIP architecture to design target-specific peptides, fuse them to E3 ubiquitin ligases, and degrade pathogenic proteins in a pipeline called Cut&CLIP. ✂️
Diffusion has worked incredibly well for direct protein structure (RFDiffusion) and sequence (EvoDiff) generation! 🌟 Our new AMP-Diffusion model applies latent diffusion to ESM-2 pLM embeddings to generate diverse, naturalistic AMPs! Check it out:
Just getting around to posting about this beautiful paper from
@DreamFoldAI
!! Learning a joint representation based on SOTA sequence/structure embeddings and decoding the joint representations of the inputs into SE(3) vector fields is very clever -- my hunch is that the ESM-2
Today, I "officially" begin my faculty position at
@DukeU
! 🥳 Super excited to welcome my wonderful team of postdocs, PhD/Masters students, and undergrads from around the globe to our little but mighty lab in Durham ― let's change the world together! 💪🏾
SaLT&PepPr is published in
@CommsBio
! 🥳 Here, we fine-tune the ESM-2 pLM to identify peptidic binding sites on target-interacting partner sequences. We fuse these "guide" peptides to E3 ubiquitin ligases to degrade disease-causing proteins! 💻➡️🧫 (1/n)
Two really cool papers to read this weekend: one new dataset and one new algorithm! 🧐
First, an incredible new dataset of protein-coding variations from exome sequencing of 983,578 diverse individuals -- over a million new variants! 🤩Retrain ESM-1v anyone?
Paper:
How can AI help us cure rare diseases? In the Aug/Sep
@WorkingAtDuke
magazine, I talk about my lab
@DukeUBME
and how we’re training generative language models to help us design drugs to target complex proteins. Keep an eye out for the magazine in your mailbox!
@dukeresearch
Such a beautiful study from our friends in
@rohitsingh8080
’s lab here at
@DukeU
! An autoencoder on top of ESM-2 for fixed dimensionality is brilliant, and perfectly suited for this problem, one that has plagued delivery and other drug design applications. 🌟 I can imagine so many
De novo protein design is great, but nature has millions of proteins- why not repurpose them?
Introducing Raygun, a new approach to protein design. It allows you to miniaturize, magnify or modify any protein. We synthesized miniaturized variants of eGFP and mCherry! 1/
Unique application of GPT-like LLMs to RNA! Model-prioritized mutations (via relative log likelihood to WT) seem to do pretty well on improving thermostability of 23S rRNA sequences. Always exciting to see sequence-based models do well for design. 😊
This is such cool work published in
@NaturePhysics
! 🤩 An N-terminal unstructured and flexible glycine–serine peptide tag can accelerate nuclear import rates for stiff protein cargos by increasing their deformability. Definitely useful for my experimentalists to deliver our
Well here it is: ESM3!! 🌟 The training regime is a bit different than your classic ESM-2/BERT model: you can start with a fully masked sequence and iteratively unmask. Super similar to our PepMLM model with ESM-2, which showed significant promise at full masking and unmasking.
We have trained ESM3 and we're excited to introduce EvolutionaryScale.
ESM3 is a generative language model for programming biology. In experiments, we found ESM3 can simulate 500M years of evolution to generate new fluorescent proteins.
Read more:
Our first deep dive into reproductive bioengineering! Here, we identify two transcription factors that, when overexpressed in iPSCs, generate functional granulosa cells, which support germ cell maturation, follicle formation, and steroidogenesis! 🤩
I'm so excited to co-host the
@gembioworkshop
at
#ICLR2024
in Vienna!! 🥳To both my computational and experimental friends, please reach out if you are interested in submitting a paper or serving as a reviewer -- it should be an incredible, one-of-a-kind meeting! 🌟
Announcing the Generative and Experimental Perspectives for Biomolecular Design workshop at
#iclr2024
!
We hope to bring together researchers in ML and experimental biology to accelerate progress on real-world applications.
Website:
Paper deadline: Feb 3
Such a clever new LM architecture: Orchid uses data-dependent convolutions to overcome the quadratic complexity of traditional attention. 🌺 Achieves quasilinear scalability with dynamic kernel adjustments for long sequences! Also, super strong as a BERT model!
Paper:
KANs seem really cool! 🌟Stemming from the Kolmogorov-Arnold representation theorem, they introduce learnable activation functions on edges, rather than traditional fixed activations on nodes! Still need to figure out which activation functions are good -- not too clear from the
A little late, but super excited to share a great collaborative work by our team at Duke, my postdoctoral lab of
@geochurch
, and my company
@GametoGen
to engineer a functioning human ovarian follicle from iPSCs using our STAMPScreen method! 🧫
After almost three years of painstaking effort, today, I'm excited to present my team's amazing work on specifying human oogonia via combinatorial transcription factor induction in iPSCs! 🥳 An incredible collaboration with my good friend
@krammecc
!
Beautiful stuff
@AlexanderTong7
@mmbronstein
and team!! MFM enables generative models to accurately match vector fields on data manifolds -- so many cool applications: LiDAR navigation, unpaired image translation, single-cell RNA trajectory inference...such impressive work! 😊
A very cool use of graph-based diffusion for enzyme evolution! Also love the application to pAgo, which can be a very powerful programmable gene editing tool with improvements.
#GenerativeAI
#ProteinDesign
Happy to share our early work on generating binding peptides conditioned ONLY on the target sequence! 🌟PepMLM masks cognate peptides at the end of target protein sequences, and tasks ESM-2 to fully reconstruct the binder region. 😷
#GenerativeAI
Training a VAE-based representation model, performing diffusion on the latents to generate new aptamers, with SPR results show an average 3-fold Kd reduction? Probably needs a bit of refinement and validation, but so cool! 🫨
#GenBio
Very interesting study that shows AlphaFold3 captures a relatively global effect of mutations on PPIs by learning a smoother energy landscape, but doesn't seem to be as atomicallly fine-grained as standard MD. Could still be good for generating synthetic mutant datasets
Hi all! We have released a preprint on our novel SaLT&PepPr language model to design peptide-guided degraders using PPI information. 🥳 An incredible effort from
@garykbrixi
and team at
@DukeU
, and our collaborators at
@Cornell
with
@mattdelisa
! 🤜🏾🤛🏾
Today is my happiest day as a scientist: my incredible undergrad
@SabrinaKoseki
was awarded the
@NSF
GRFP! 🍾 I've mentored Sabrina for 3+ years now, and her brilliance, curiosity, and diligence EPITOMIZES scientific excellence. I've been so lucky to have her as my student!😊
Nice! A great application of Mamba on concatenated, homologous protein sequences! 👍 Happy the authors mentioned our recent PTM-Mamba model, where we showed the application of Mamba to represent modified sequences! Overall, SSMs seem to have a found nice home in biology! 🧬
The Programmable Biology Lab is all set up at
@DukeU
— all in 1.5 days! 💪🏽 Shoutout to my amazing Harvard CS undergrads, Garyk and Suhaas, for helping make this happen. 🌟Time to design some programmable proteins! 💻➡️🧫
My favorite paper of the week: CONDITIONAL preference optimization via mDPO to ensure that both visual and textual modalities are equally considered! 🤩Architecturally, the authors introduce a reward anchor to maintain the likelihood of preferred responses, significantly reducing
We're so grateful to
#EndAxD
for believing in our binder-guided degrader technology, powered by
#GenAI
, to study and treat Alexander disease. 💻🧫 Thank you to
@thomas_wagner
and little Max for inspiring my lab to study, treat, and cure AxD! 🙏🏾
So simple and clever: for new sequences, examine the loss patterns of tokens on a pre-trained model then train the new language model with a focused loss on tokens with higher excess loss. Would be fun to fine-tune pLMs with this method!
Paper:
Remember
@DeepMind
's protein folding algorithm, AlphaFold2? I'm excited to share our team's early work on our own protein structure prediction pipeline! (1/3)
My last day together at the
@medialab
with my superhero team of undergrads,
@SabrinaKoseki
, Teodora, and Emma! So very proud to have watched you three become such brilliant young scientists, and so privileged to have mentored you as my first students! 🌟🥹
As the second week of my lab's existence comes to a close, I'm super grateful for the dedication and commitment of my students to our (quite lofty) goals. Yeah, we may not be the most well-known bunch, but we're about to do big things! 💪🏾
#FearlessFriday
An integrated pLM trained on BOTH sequence and structure! 🤩 We're excited to leverage these embeddings in our lab -- the T5 architecture is super powerful for design! So many congrats to
@HeinzingerM
and the
@rostlab
for another pioneering work! 👏
Hi friends! Just a gentle reminder that our
#ICLR2024
@gembioworkshop
paper submission deadline is coming up in ONE week: February 3rd! We have multiple tracks that should be of interest to both the ML community and experimentalists!
There’s really no better than feeling than when your first mentees get the
@NSF
GRFP!
@garykbrixi
, you’re going to continue to be a scientific rockstar in grad school! 🌟 You and
@SabrinaKoseki
are the most deserving recipients! I’m so proud of you guys. 💙
Residue energy-based preference optimization of antigen-specific antibodies. 🫨 A very interesting way to get around the lack of diverse paired structures for training!
How to steer generative antibody design models to design antibodies satisfying user-desired preferences, such as rationality and functionality🤔?
We are thrilled to introduce AbDPO, a general framework for antibody design via energy-based preference optimization.
The preprint,
Hi all, I'll be presenting our team's work on designing target-binding peptides with generative language models (and other cool protein design work!) tomorrow
@valence_ai
! 💻🧫 Please find Zoom details here: . Looking forward to seeing you there! 🥳
Beautiful beautiful work! I’ve said that pLDDT really tells you about how disordered proteins are — now we can leverage AF2 with simulations to explore it! 🥰 Would be excited to see this extend to AF2-Multimer as well. 😅
Happy to share the publication of our paper in
@Nature
Conformational ensembles of the human intrinsically disordered proteome
Work led by
@GiulioTesei
and
@AnnaIdaTrolle
. I'll post more later, but for now here is a link:
and a short movie about the work
Very excited to have our tour-de-force genetic screening paper out in
@CellRepMethods
! We've developed a powerful, integrated pipeline for identifying, screening, and interrogating genetic perturbations for defined phenotypic outputs. Let's review how STAMPScreen works! [1/5]
When scientists at the Wyss,
@harvardmed
, &
@medialab
got frustrated w/ existing tools & methods for gene engineering, they made their own.
@PranamMIT
,
@krammecc
, & colleagues invented STAMPScreen to help scientists get from a database to results fast.
@andrewwhite01
Miniproteins are this weird intermediate between the specificity of an antibody and the permeability/drug-like properties of peptides. I personally think miniproteins are a thing because diffusion/flow-matching are not ideal for antibodies or peptides. At least with the de novo
@Pandeylab
@NatureComms
Do you understand how dangerous it is to draw strong conclusions from poorly analyzed data with severe methodological flaws? People look at journals such as
@NatureComms
for legitimate conclusions that can affect hiring, public practice, etc. This paper should be retracted!
An incredible paper that I have been excitedly anticipating is out in
@Nature
today -- so many congratulations to my brilliant friend and collaborator
@MartinPacesa
and the
@MartinJinek
lab on this groundbreaking structural elucidation of Cas9 conformational activation! 🎉🎊🎈
Very excited to see our work on Cas9 conformational activation out in
@Nature
!
What started as a very poor crystal structure back in 2017 in the
@MartinJinek
lab, turned into a full blown cryoEM project during the pandemic.
Hi all! I hope you can take a few minute to learn about our latest research on designing target-binding peptides (amongst other useful proteins) using
#GenerativeAI
! Thanks so much to
@valence_ai
for having me! 🙌🏾
A very cool new pLM from WWC-winning 🇪🇸! Only 14.8 M trainable parameters via training on UniRef50, focusing primarily on enzymes. 😮💨 Generated seqs have plausible biochemical properties, but wish there was more comparative benchmarking with bigger pLMs.
Check out our new COVID-19 diagnostic platform, out today in
@ScienceAdvances
! Peptide beacons + mini-TIRF technology = sensitive S-RBD protein detection at femtomolar resolution. Really a great extension of our ubiquibody design work! 🤩
Read more about what my company
@GametoGen
is up to: . It's crazy to see how far it's come since
@martinvars
and I founded it back in 2020! 🤯
@DinaRadenkovic
and
@krammecc
continue to lead it to newer heights -- you guys are amazing! 🌟
Super excited about trying out CellREADR from our friends in Josh Huang's lab here at
@DukeNeuro
! This will be very useful for dynamic cell state engineering applications. 🧫
Hi everyone! We are hosting Duke AI Day (sponsored by
@nvidia
) on our beautiful
@DukeEngineering
campus on June 7th!☀️If you work on developing or applying AI algorithms, we'd love for you to register (it's free to everyone!) and submit an abstract:
Okay, I know
#AutoGPT
is the AI hype right now, but check out the Regression Transformer! Discontinuous-masking-based autoregression for conditional protein sequence generation?! Yeah, my lab's definitely using this! 😎
Check out our early-stage work on designing computationally-optimized peptides that bind to SARS-CoV-2 and recruit E3 Ubiquitin Ligases for degradation! Could be a potential alternative to PAC-MAN?
#COVID19
With
#COVID19
raging through the world this winter, alongside promising vaccines from
@moderna_tx
,
@pfizer
, and
@AstraZeneca
, we present a novel antiviral platform for targeted intracellular degradation of SARS-CoV-2 via computationally-engineered peptide fusions. (1/4)
For the few of you who follow me (love you guys! 🥰), you may have noticed the not-so-subtle Twitter handle change.😅 I hope you will join me on this new journey to solve the biggest problems in biotech! Let's do this together. 💪🏾
Absolutely beautiful work from my talented and amazing friend,
@JulesGrunewald
!! If you're crazy about base editing, what's better than doing both C->T and A->G at the same time?! 🧬🍾
Very excited about these phase 1 results from
@CaribouBio
! The ultra high specificity of the chRDNA platform was crucial for achieving this. Very glad we had the opportunity to collaborate and elucidate the mechanism of increased specificity.
Really excited for this paper to be out: Collaborating with the incredible group of Toshi Shioda, we devise a serum-free, feeder-free culture to grow hPGCLCs indefinitely, while maintaining germ cell identity! A powerful tool for further engineering. 💪🏾
This is the incredible work of my brilliant, prodigious (brand new!) graduate student,
@pengzhangzhi1
. He just started in JANUARY and immediately innovated this beautiful architecture!
@Pandeylab
@NatureComms
If this was good science, then I'd be like okay, so be it and let's work to improve whatever conclusions we found troubling. But no, these were poorly drawn conclusions with no methodological or empirical basis.
#Llama3
is insane. 15 trillion tokens, 405 billion parameters with compute budget of 3.8×10^25 FLOPs. 😵💫 With that scale, I love the relative architectural simplicity: autoregressive decoder + DPO, with RoPE and grouped query attention built in. So clean. 🧼 I feel like we're
So many congratulations to my wonderful friend and collaborator, Dr.
@JooyoungLee10
!! What an incredible PhD thesis and defense!! So excited to have you in Boston!! 🎊🧬🎊
PTM-Mamba is the first pLM to explicitly tokenize PTM residues enabling accurate capture of PTM-specific effects in its latent space. ✨ To do this, we take advantage of the amazing hardware-aware, sub-quadratic Mamba architecture. We train novel bidirectional Mamba blocks whose
LIMA from
@MetaAI
is super exciting -- it shows us that well-curated, pre-trained LLMs are strong generalizers! For us protein designers, this is great news: we have solid pre-trained models (i.e. ESM-2) but limited, high-quality task-specific data. 🙌🏾
It's always a good day with FAIR and team come out with a new ESM model! The ability to generate novel sequences into desired structure holds huge promise for biologics and enzyme design -- excited for the team (
@r_manvitha
, et al.) to get started with it!
#proteindesign
Here’s what we learned from inverse folding on millions of
#AlphaFold
structures. Exciting time to bring a 800x new scale to
#proteindesign
. ESM-IF1 more accurately designs sequences to fold into desired structure, also unlocking new design capabilities.
Please have a read of our paper (), try out our code on your datasets, make MegaGate your go-to cloning method (I have!), and start screening! As this is my first senior author paper, I am so proud of
@krammecc
,
@amplesa
, and the whole STAMPScreen team! 🎉
Mixture-of-Depths! Such a cool, simple concept: dynamically allocate compute only to the most significant top-k tokens. This makes FLOP expenditure predictable, and can also serve as a great regularizer! Excited for the lab to try! Fun stuff,
@GoogleDeepMind
!
Paper:
This is the result of brilliant algorithm design by amazing undergrads
@bhat_suhaas
and
@kalyanmpalepu
, steadfast experimental validation by my PhD students,
@vivi_64_
and
@lauren_hong11
, and extensive computational benchmarking by the rest of the team. I'm a super lucky PI! ☺️
@sokrypton
@Patrick18287926
I love
@Patrick18287926
's work and we both work on peptide design (go peptides!), but I think this paper fell a little victim to the chicken and egg problem. It's really no one's fault, because they did the best with what they had in the PDB (and it's still a great study!):
I guess all Cas9 loops aren't good? While our ScCas9 and Sc++ loops stabilize PAM interaction, this RuvC loop stabilizes mismatch binding at the PAM-distal end of the spacer. By mutating these residues, the authors create SuperFi-Cas9 with very high on:off targeting ratios! 🤩
1. I am thrilled to share the accepted version of our Cas9 mismatch manuscript, now in
@Nature
! Here’s a (not-so) brief outline of some of the findings from our original preprint, and some VERY exciting new findings
Hi everyone! Just a reminder to register for Duke AI Day, which will be at
@DukeEngineering
on June 7th! 🤖 Please also feel free to submit a short abstract for posters/talks -- lots of great people, food, and science! More info here: See you there! ☀️
This paper was rejected, reviewed, rejected, reviewed again MULTIPLE times before finally finding a home at
@CommsBio
, one of my favorite journals! As a new PI, it was painful and rewarding, and I'm so grateful for my students and collaborators for staying the course together! 🥹