DALL-E’s amazing images are popping up all over the web. That software uses something called a diffusion model, which is trained to remove noise from static until a clear picture is formed.
Turns out diffusion models can design proteins too!
Today we’re sharing a deep-learning method for protein design called RoseTTAFold Diffusion. With minimal input, it turns prompts (“create a molecule that binds X”) into new proteins that fold and function in the lab. We’ve tested 100s already.
PDF here:
We’re really excited to announce that we’re releasing code for running RFdiffusion! The code is released under an open source license and is free for anyone to use.
Today we're making RF Diffusion, our guided diffusion model for protein design with potential applications in medicine, vaccines & advanced materials, free to use. The software has proven much faster and more capable than prior protein design tools.
It's so great to see our RFdiffusion paper now live
@Nature
. This article from
@ewencallaway
gives a great overview of RFdiffusion and protein diffusion models more broadly, and also highlights some of the ways people are already using it in their own research!
Digital art techniques can now devise custom, working biomolecules on demand.
These proteins could form the basis for vaccines, therapeutics and biomaterials.
Read the full story:
I'm excited to announce that I'm part of the team at Xaira Therapeutics!
This project has been some time in the making. I'm convinced that generative modelling, and ML for biology more generally, will play a pivotal role in the next generation of therapeutics. 1/2
We designed binders to five medically-relevant molecules. These binder proteins pass our most stringent in silico metrics and we’re testing them in the lab right now. In the future, it might only take a few seconds to design a high-affinity binder protein for any target you want
Such a nice article from
@ewencallaway
@Nature
- thanks so much for writing it!!
‘A landmark moment’: scientists use AI to design antibodies from scratch
RFdiffusion is best-in-class for protein backbone generation (low RMSDs to AlphaFold models) and surpasses inpainting and hallucination at scaffolding functional motifs. It makes bigger, more diverse, and more accurate proteins. (600aa protein, gray=design, colors=AF2)
RFdiffusion can also be guided with symmetry. For example, we have designed and are characterizing a new protein that engages all three symmetric ACE2 binding sites on the SARS-CoV-2 spike protein. In this case C3 symmetry works, but any symmetry is possible.
We said last week we were excited to see RFdiffusion being tested in the wet lab. Today, the paper is on
@biorxivpreprint
, and there's a LOT more exciting experimental data! See David's thread below 👇
We’re very happy to announce that our RFdiffusion manuscript is now on bioRxiv! A lot can change in a week - we’ve now tested over a thousand designs and there’s so much exciting new data! 🧵
Not to mention all the symmetric oligomers
@HelenEisenach
and
@andrewjborst
have designed and tested in the lab! Being able to design with any symmetry you want opens up so many applications for therapeutics and enzyme design.
Here at
@UWProteinDesign
, we’re discovering every day just how powerful RFdiffusion can be. It’s amazing to see technology grow so rapidly. We’re going to make RFdiffusion code available in the near future so everyone can get a chance to design their own amazing proteins!
.
@PreethamVi
and
@SusanaVazTor
then used RFdiffusion to design binders to two hormone peptides. These bound too tightly for our instrument to measure! Likely picomolar affinity. That’s the strongest binding to any protein, peptide, or small molecule achieved by computation alone!
Base RFdiffusion is capable, but with tuning and guidance *excels* at hard tasks, such as scaffolding enzyme active sites! Woody Ahern showed that doing this, AF predicts (color) that our designs (gray) fold to atomic accuracy. An exciting future for enzyme design lies ahead!
This work builds on the progress and insights of so many people, both
@UWProteinDesign
and beyond. It’s such an exciting area to be working in right now! Just today there was this news from
@generate_biomed
:
Today we introduced Chroma, a generative model that creates new proteins & protein complexes given geometric & functional constraints. It learns to transform unstructured, random 3D shapes into
#protein
molecules, which can have tens of thousands of atoms.
It’s wonderful to see this in print! This was the major part of my PhD work, developing an assay to engineer polarity in cells using de novo designed proteins. There’s so much new stuff in this since the preprint, from the amazing
@Lara_K_Kruger
– check it out below!
Really excited to be giving this talk alongside
@DaveJuergens
tomorrow! Do join us if you're interested in hearing about RFdiffusion and how it's being used to tackle a broad range of protein design challenges!
Next Tuesday (2/14) @ 4 pm ET we'll have
@_JosephWatson
talk about RF Diffusion!!
Sign up at to receive Zoom links in your inbox and add events to your calendar!
Very exciting to see this work out! It still feels quite amazing that you can take three functional bits of protein and bring them all together with novel scaffolds, all while retaining their native function. Congratulations to
@karla_mcastro
for leading this super cool project!
Announcing AlphaFold 3: our state-of-the-art AI model for predicting the structure and interactions of all life’s molecules. 🧬
Here’s how we built it with
@IsomorphicLabs
and what it means for biology. 🧵
I'm extremely happy to see the final project from my PhD published today, in collaboration with the
@Oneill_Lab
,
@RachelSEdgar
lab, other members of the
@deriverylab
and many others! Check out the thread below 👇
We’re excited to share our work describing the role of macromolecular assembly & condensation in the acute buffering of cellular water potential, published today
@Nature
A thread: 1/16
Proteins are one of the most interesting applications of LLMs /
#AI
, but have't seen many overviews since it's moving so quickly. Here's my running notes on the topic and a 🧵with some of my favorite links -
Congratulations to Susana and the team! This project used an array of different methods (ML-based and not) to address the problem of binding to hormone peptides, paving the way to better diagnosis of several human diseases
A huge congratulations to Susana, Phil, Preetham and the rest of the team! It was such a pleasure being involved in a project that brought together a whole range of approaches and people to make big progress on an important problem!
🎉 Excited to share that our paper "De novo design of high-affinity binders of bioactive helical peptides" is officially out!
Hats off to coauthors
@definitelyphil
and
@PreethamVi
for making this project a reality. 🙌 Grateful for their collaboration!
We’ve also included a lot of examples to hopefully help get everyone up and running with the code. One notable improvement since we posted the preprint is that the model can now run with significantly fewer steps at inference time, giving (at least) a 4X speedup!
Congratulations to Nate and colleagues for a really amazing paper 🎉 This work laid the groundwork for so much of the subsequent methods development in binder design!
We're extremely grateful and excited to have the backing to try and realise this vision! I'm very excited to work with our growing team of amazing people in the UK and US.
Read more about the project here:
To our knowledge, this project also details the first designed binders made to targets from protein sequence alone (i.e. no starting structure), with a Hallucination approach developed in collaboration with
@JosephRogers100
The past month we have seen some amazing AI news 🤖. But we should be careful not to miss out on what I believe could be one of the most disruptive proteomics technologies this decade. Protein Diffusion could usher in a new Protein Design Era.
This work has been a fantastic collaboration between many people at
@UWProteinDesign
,
@Columbia
and
@AIHealthMIT
. We’re all so excited to see what the wider scientific community can make with RFdiffusion.
@BioExplorr
Thanks!! My guess would be it'll be made available in the next few months (we're super committed to making this available as soon as reasonably possible!). A Colab is also a fantastic idea :D
Congratulations to Emmanuel Derivery (
@DeriveryLab
), Group Leader in
@CellBiol_MRCLMB
, who is the 2024 recipient of the Hooke Medal from
@Official_BSCB
!
Read more about Emmanuel and his research on asymmetric cell division here:
#LMBNews
Thanks, as always, to David Baker, and to all the coauthors who helped make this project a reality
@dejsee
, Connor Weidle,
@wanderingriti
, Ellen Shrock,
@definitelyphil
, Buwei Huang, Inna Goreshnik, Russell Ault, Kenneth Carr,
@SingerBenedikt
, Cameron Criswell...
This work *does not* solve drug design, but it shows for the first time that accurate antibody design is possible. Hopefully this work will lay the groundwork for future developments. I hope one day that making therepeutic antibody will be as easy as pressing “go”!
@chrisfrank662
Thanks! So RF scales ~quadratically with protein length, so for an 1000 amino acid it takes about 30s per diffusion step (and hence, for a 200-step trajectory ~1.30h, as opposed to ~2 minutes for a 100aa protein) on an A4000 gpu. 200 steps is probably more than you need though!
@EmilyLeproust
@schubertcm
@UWproteindesign
Thanks for the nice post! We'd be very keen to let you know how Twist is already enabling our ongoing RFdiffusion developments. Do DM me if you'd be interested in chatting!!
@proteincapsid
It should run fine, at least for small proteins. I just made this 100 amino acid protein on my local computer (no GPU) in 5.30 minutes. 300 amino acids would take around 50 minutes. It's obviously quite a lot faster with a GPU (available on Google Colab)
@notresz
Yes! ProteinMPNN takes a protein backbone (N-Ca-C-O) atoms and finds an amino acid sequence that would fold to that backbone structure (and it's very good at it!). RFdiffusion instead makes the protein backbone, which we then feed to ProteinMPNN :D
A special mention and thanks to
@JosephRogers100
, with whom I worked very closely on the peptide binder Hallucination part of this project. Those binders were the first validated proteins I ever designed, and I'll never forget that "oh my word it actually works" moment!
@Jiaxing_Tan_
The hotspot input is chainletter-residue_number (rather than AAidentity-residue_number). This case seems to work fine🙂 if you provide an email or something I can send you the submission script I used. We'll also make this clearer in the README!
@br_jimenez
@sokrypton
Really glad you're using the code! So to be clear, here, you're designing the cyan peptide? I think it's reasonably likely AF2 (non-multimer) wouldn't predict these well - I think it can struggle with small peptide sequences
These binders typically bound via rigid secondary structure-based interactions, however, which contrasts to how nature has “solved” the binder design problem. Nature uses antibodies to bind to targets, which interact with proteins through more flexible CDR loops.
In this work, we delineated a mechanism through which cells buffer the intracellular availability of water in response to acute changes in cellular water availability. It was a real joy to work on such a collaborative project!
New preprint! We came up with 2 methods to design de novo scaffold proteins to hold arbitrary functional motifs: hallucination (optimizing a seq against predictions of RoseTTAFold (RF)) and inpainting (recover masked regions of seq+struc.) 1/n
I spend a lot of my time thinking about proteins, but it's important to remember that protein folding and function critically depend on the solvent (water) in which the protein is dissolved.
This project grew out of discussions with my friend and co-lead author
@naterbennett0
last year. As a reminder, last year we published and released RFdiffusion:
Today we're making RF Diffusion, our guided diffusion model for protein design with potential applications in medicine, vaccines & advanced materials, free to use. The software has proven much faster and more capable than prior protein design tools.
Excited to share our work on zero-shot mutation effect prediction using RoseTTAFold! Thanks
@minkbaek
,
@DaveJuergens
and
@_JosephWatson
for being amazing!! collaborators. Preprint here:
@MathieuEmile
@CalSleeper
@FlightFreeUK
I 100% agree! The sleeper is so great, and incredibly convenient, but since the upgrade it's just way too expensive to get a bed. Shared rooms would still be much more comfortable than the seats!
The question was though, would these antibodies actually work?
@RobertRagotte
and
@AndrewJBorst
, co-lead authors on the study, led the experimental effort. We designed VHH binders to four unrelated targets. A structure of one of them shows that it binds almost exactly as designed
RFdiffusion excels at protein interface design (making a binder to something). We demonstrated very high experimental success rates in the paper, and across the Institute for Protein Design scientists were making binders to historically extremely challenging targets.
@ozalabCP
Yes, for clarity, in this work we are only designing CDRs. We keep the VHH framework fixed and just design new CDR loops to interact with a user-specified epitope
Antibodies have many advantages as therapeutics, and are as such the biggest class of therapeutics globally. However, to date, design of structurally accurate novel antibodies has not been demonstrated. We wondered if generative models like RFdiffusion could be the answer.
@danofer
Good question! So the input is actually protein backbone 3D coordinates, which we parametrize as C-alpha (x,y,z) coordinates and a residue (backbone) orientation. The output is then also 3D backbone coordinates. We then generate a sequence with ProteinMPNN
@Eyesgack
We actually put this into an accompanying manuscript, which includes both some cool new RFdiffusion advances (partial diffusion and peptide binder design), as well as other methods. The binders work really well!
@Robot83821931
This is a great question! It's an area that we're really interested in, and one that others (including Tommi Jaakkola, a supervisor on this project) has worked on (). Currently it's not possible (at least no explicitly) to do 1/
We therefore built upon RFdiffusion to train a generative model specifically for antibody design. This model is capable of designing diverse and truly de novo antibodies.
@LeoChan213
From the outset, we wanted RFdiffusion to be computationally tractable to run, so, given that each step is quite costly, we limited to 200 steps during development. But benchmarking (to be shown in the final manuscript) then showed we can use just 50 with no performance drop!
@ozalabCP
I see these as separate problems really. If you already have an antibody/VHH, optimising it with ML/experimental methods is a very good idea. Our work designs VHHs to sites to which you don’t have one. For example, the TcdB site doesn’t have an antibody/VHH, but we made one 😊