Excited to finally be able to share our ICLR work critically analyzing the capacity of deep learning docking methods to generalize and how to improve this (spoiler scaling, augmentation and RL)! With this, we release a new significantly improved version of DiffDock!
A thread! 🧵
Excited to share DiffDock, new non-Euclidean diffusion for molecular docking! In PDBBind, standard benchmark, DiffDock outperforms by a huge margin (38% vs 23%) the previous state-of-the-art methods that were based on expensive search!
A thread! 👇
New paper!🤗
Do all your samples from Stable Diffusion or Dall-E look very similar to each other? It turns out IID sampling is to blame! We study this problem and propose Particle Guidance, a technique to obtain diverse samples that can be readily applied to your diffusion model!
Great to finally be able to share our work on Subspace Diffusion Generative Models accepted at
#ECCV
!
Generalizable technique to improve the performance & speed up diffusion models by restricting the diffusion via projections onto subspaces + insights 👇
Generative models are necessary to fully capture uncertainty and conformational flexibility of protein structures, but how can we build such models? At the ICLR MLDD workshop, we'll present EigenFold, work led by Bowen Jing with undergrad students Ezra Erives and
@peterpaohuang
!
Happy to share that DiffDock was accepted to
#ICLR
and even happier to share new exciting results in the updated manuscript! Unlike previous docking methods, DiffDock retains a large part of its accuracy when run on computationally folded protein structures! A thread 🧵
Honoured that our work "DiffDock: Diffusion Step, Twists, and Turns for Molecular Docking" received the Best Student Paper award yesterday at the score-based modelling workshop at
#NeurIPS2022
among so many outstanding submissions!! 🎉🙏
Very cool to see that
@nvidia
's CEO has just announced their new BioNeMo cloud service in his keynote and highlighted DiffDock as the molecular docking algorithm they have chosen to integrate! Great to see the power of open-source science!
My master thesis is online! In it, I summarized my work in my first year of PhD (in collaboration with Bowen Jing and
@HannesStaerk
) and gave a unified perspective of the technical ideas underlying torsional diffusion and DiffDock + some new results! 1/N
Very cool to see Jensen Huang, once again, highlight DiffDock as a key component of the
@nvidia
#BioNeMo
platform for virtual screening in his much-anticipated keynote at
#GTC24
!
Can ML help us obtain precise approximations of fundamental bioinformatics problems? We present NeuroSEED a framework to embed biological sequences, its effectiveness in the hyperbolic space and how it can be used for hierarchical clustering and MSA
Slides from my talks on "Improving the Aggregation in Graph Networks: can nodes understand their neighbourhoods?" I gave at AstraZeneca and the University of Cambridge are available:
Busy week ahead at
#NeurIPS2021
: on Thursday I'll be presenting our work on NeuroSEED. Come to learn about a new geometric ML approach to many bioinformatics tasks!
Paper:
Presentation and poster:
1/3
Wow! A new exciting preprint from
@timrpeterson
characterizes a novel drug and identifies lipid metabolism as an intersection of mTOR and NAD+ pathways! Amazing to read that DiffDock played a small part in their discovery suggesting the binding site!
Today we release Chroma, our generative model for protein design, open-source on Github with a Nature publication describing its biophysical and crystallographic validation. Chroma Conditioners enable one model to target multiple protein design objectives.
Life update: next fall I'll be starting a PhD at MIT to work on structured AI models for drug discovery and biochemistry! A huge thank you to all the mentors and friends that have helped me along this journey!
Thrilled to announce that this summer I will join
@mmbronstein
@emaros96
and many other great scientists
@TwitterResearch
to tackle some of the challenges in Graph Representation Learning and Geometric Deep Learning!
@terns
The US immigration system makes it incredibly hard for international scientists to stay and thrive. People often mention the existence of visas like EB-1 for talented individuals, but as a PhD student, I would have to spend 12-18 months without leaving the US for the process...
I strongly believe that the U.S. is at its best when it brings in talented scientists and engineers from all over the world to study, to be hired into companies to push forward our most advanced science and engineering efforts, and to start new U.S.-based companies. We should
Dear academic, biotech & drug discovery twitter colleagues, I need your help!
I'm collecting a list of benchmarks & evaluation datasets for protein-small molecule affinity and virtual screening capacity (e.g. published hit discovery campaign results), which ones do you recommend?
Very interesting work showing that traditional docking methods do not work when run on computationally generated structures.
This confirms what we report in where we also highlight how DiffDock is instead able to retain a large portion of its accuracy!
"How accurately can one predict drug binding modes using AlphaFold models?"
Despite being more accurate than old-school homology models, these models are not yet amenable to accurate docking of inhibitors
Very cool to see DiffDock featured in
@MIT
News! Thank you to Alex and
@timrpeterson
for writing and contributing to the article!
I'm excited to see the amazing biology it could help uncover!
A new update to DiffDock-L with significantly reduced inference-time memory requirement is now live on GitHub and
@HuggingFace
! Credit for the update to Jacob Silterra!
GitHub:
HuggingFace Space:
New review on diffusion models for protein structures and docking! It has been crazy to see this field develop in the past 2 years, we tried to shed light on the different ideas and design choices that have emerged!
Our review on diffusion models for protein structures and docking is out
@HannesStaerk
@GabriCorso
Bowen Jing
@BarzilayRegina
Tommi Jaakkola. This summarize the advancements up to 2023. There have been lots of exciting new works since!
Very interesting article from
@Nature
about how the quality of the education given by Italian universities is highly undervalued in rankings and many of its best students are hired by top US universities
I had a wonderful time attending the Molecular Machine Learning Conference in Montreal on Monday. With a combination of outstanding research and new friendships, it was a great way to remember our dear friend, mentor, and colleague Octavian, a year after his tragic passing 🙏
Can ML help us obtain precise approximations of fundamental bioinformatics problems? We present NeuroSEED a framework to embed biological sequences, its effectiveness in the hyperbolic space and how it can be used for hierarchical clustering and MSA
Next week I'll be at
#NeurIPS2023
, I'd love to meet as many people as possible! Reach out if you'd like to chat!
Also, remember to come to
@workshopmlsb
on Friday! 🤗
I'm arriving in New Orleans for
#NeurIPS2022
tomorrow. It's my first in-person ML conference, I look forward to meet for the first time so many researchers including long time collaborators! If you are interested in chatting with me, feel free to send me a direct message!
Looking forward to presenting tomorrow at the seminar hosted by the great
@mmbronstein
and
@befcorreia
! The AF3 release has demonstrated once more the effectiveness of diffusion models for docking & the need for proper benchmarks in the community... it will be a fun discussion!
💥 Happening this Friday!
🚀
@GabriCorso
's talk about DockGen - strategies to improve generalization of docking models through scaling, careful dataset curation, and a new self-training paradigm (confidence bootstrapping).
Hosted by
@mmbronstein
and
@befcorreia
📅 May 17,
Happy to announce that our paper on "Principal Neighbourhood Aggregation for Graph Nets" was accepted at the GRL+ ICML 2020 Workshop! You can find the updated paper with new interesting results on arXiv!
New research on GNN expressivity! Generalising theoretical results from GIN to uncountable feature spaces, yielding PNA, an empirically powerful GNN aggregator.
Work lead by
@GabriCorso
and
@lukecavabarrett
, alongside
@dom_beaini
, and
@pl219_Cambridge
:)
Next Tuesday I'll be giving the first talk on Particle Guidance (joint work with
@xuyilun2
&
@ValentinDeBort1
)! 🎉 I'll also be sharing some exciting new results that haven't even made it to arXiv yet! 🔥 Join us!
Next week at M2D2,
@GabriCorso
will discuss how we can improve the sample efficiency and diversity of generative models for molecular conformer generation and more.
Join us live on Tuesday, November 28th at 11 AM ET.
See more details here:
It was a pleasure visiting
@Caltech
this week to give a guest lecture in
@davidvanvalen
's class! Great hospitality (thx
@RohitDilip8
!), met some amazing researchers, and even got to do a few nice hikes in LA!
The first edition of the Molecular Machine Learning
#MoML
conference is starting in person at MIT! Amazing set of speakers and posters lined up in memory of Octavian Ganea!
@MIT_CSAIL
@AIHealthMIT
@kochinstitute
Tomorrow at 1:15pm GMT, I'll give a presentation to the Cambridge AI group on the expressive power of graph neural networks and my recent works on PNA and DGN. The talk should last ~45min and is open to everyone, so come by if you are interested!
New paper!🤗
Do all your samples from Stable Diffusion or Dall-E look very similar to each other? It turns out IID sampling is to blame! We study this problem and propose Particle Guidance, a technique to obtain diverse samples that can be readily applied to your diffusion model!
If you are in Vienna for
#ICLR2024
, come around to our poster to discuss how to improve docking generalization and how to integrate RL supervision in diffusion models!
@arthurdeng0205
and I will be at poster
#260
in Hall B this afternoon 4.30-6.30pm!
Excited to finally be able to share our ICLR work critically analyzing the capacity of deep learning docking methods to generalize and how to improve this (spoiler scaling, augmentation and RL)! With this, we release a new significantly improved version of DiffDock!
A thread! 🧵
DiffDock is now on
@huggingface
Spaces with interactive visualization of the diffusion steps. Built using
@Gradio
and 3dmol.js and you can upload your own ligands as SMILES or sdf/mol2. Happy docking!
1/2
This year I have the luck of being an organizer of my favorite workshop! The Machine Learning in Structural Biology workshop will be back at
#NeurIPS
once more 🎉! Call for papers coming soon!
The Machine Learning in Structural Biology workshop will be back at
#NeurIPS
once more! MLSB will be an in-person workshop held in New Orleans in December.
Website & Speaker Lineup:
Mailing List:
New review published on Nature
@MethodsPrimers
about GNN in life sciences! Aims to be a resource for new researchers and practitioners in the field to understand the potential and drawbacks of these models
We have a new little review on GNNs in life sciences in "Nature Reviews Methods Primers"!
We try to keep it focused and provide some interesting considerations instead of just listing papers. I hope that makes it useful - let me know what you think!
1/2
Octavian was a wonderful person. He was incredibly smart and yet so kind, humble and generous. I'll cherish his teachings and example for the rest of my life. My thoughts at this moment are with his family whom he dearly loved. Thank you for everything Octavian, rest in peace 🙏
While we wait for the surprisingly long arXiv approval, you can get an exclusive preview of our new paper tomorrow at
@HannesStaerk
reading group!!
(Zoom link and preprint on his website )
This Tuesday
@GabriCorso
and Bowen Jing present their new paper "Torsional Diffusion for Molecular Conformer Generation"!
Beating the SOTA by a huge margin!
Find paper + Zoom for Tue 3pm UTC here:
With:
Jeffrey Chang
@BarzilayRegina
Tommi Jaakkola
The recording of my talk on Theoretical Foundations of Graph Neural Networks is now live (+slides in desc.)! 🕸️
Join me as I derive GNNs from first principles, motivate their use in the sciences, and explain how they emerged along several research lines.
First, we realized UniProt IDs or sequence similarity splits do not properly distinguish between evolutionarily conserved pockets. Instead, we propose DockGen a new benchmark based on binding protein domain splits and compatible with PDBBind training
At MoML
@NaefLuca
will present some super important work that will provide the foundation for more advancements in ML for protein-protein docking for years to come! Register for MoML not to miss it!
At the
#MoML
conference on Nov 8th at MIT
@NaefLuca
from will announce their new dataset to advance the state of Protein-Protein docking!
Sign up for the free registration waitlist:
Or to register:
I'm in Vienna for
#ICLR2024
this week! Looking forward to catching up with many friends and colleagues, meeting new amazing researchers, and chatting about science!
A new Machine Learning Engineer position is open in our group! Come to work with us on developing open-source AI tools for drug discovery and biomedicine such as
#DiffDock
,
#FrameDiff
,
#RFDiffusion
, and many more! (Please RT!)
Join us at
#JameelClinic
, a global hub leading in knowledge, innovation, and AI for health + biotech! If you're a machine learning engineer, we invite you to become an integral member of this dynamic ecosystem.
🌟Apply now:
Hannes always goes straight to the point! 🙂 Take a look at this new great paper from Hannes and Bowen on how to do flow matching on discrete data by carefully designing well-behaved flows on the simplex! Plus some cool DNA design applications! 🧬
New paper :)
"Dirichlet Flow Matching with Applications to DNA Sequence Design"
TLDR
1. try linear flow matching on simplex
2. oh problem: explain
3. fix it with Dirichlet flow matching
4. Try on DNA, nice, better than language model
1/4
Thanks to the amazing help of Jacob Silterra and the grant from
@huggingface
&
@Gradio
you can now use the new version of DiffDock directly on Hugging Face for free!
DiffDock-Web
A Gradio demo from MIT researchers. DiffDock is a diffusion model for molecular docking & an increasing number of researchers are expected to be interested in using it. Servicing the model/demo through HuggingFace helps Scientists to share/explore the model easily.
Unfortunately the same happens with scientific open-source software... somewhy people believe that it is the job of academics and other volunteers to actively maintain open-source software and almost no pharma company seriously invests in it
No, it was not a joke. "Our paying customers need X, when will you fix it?" may not be the best way to introduce yourself to an open source project.
#TodayInOpenSource
💊 How is generative AI about the change the game for drug discovery?
@nvidia
's Anthony Costa writes why tools like DiffDock are leading to a major inflection point in research on human health and biology for
@HealthTechMag
New paper!🤗
Do all your samples from Stable Diffusion or Dall-E look very similar to each other? It turns out IID sampling is to blame! We study this problem and propose Particle Guidance, a technique to obtain diverse samples that can be readily applied to your diffusion model!
Not sure if your paper will make the cut for NeurIPS? Are you tired of poor reviews from overloaded and unrewarded reviewers? Submit the abstract by Friday to
@LogConference
(paper deadline on Sep 16th)!!
Hey y'all 👋! Please consider submitting your work to
@LogConference
. Abstracts are due on 9/9 AoE!
🔥 2 tracks: extended abstracts (4p) & full papers (9p)
🔥 focus on high-quality reviews
🔥 PMLR proceedings for accepted papers
Spread the word! 🙏
CfP:
Bowen and I will be presenting Torsional Diffusion later this morning at poster
#438
at the poster session (11AM) at
#Neurips2022
. Looking forward to seeing many of you!
The new version of the manuscript on Particle Guidance is now live on arXiv ! 🔥 Check it out for new exciting theoretical and methodological results and come to our virtual seminar tomorrow to hear about the overall work! 🤗
Next Tuesday I'll be giving the first talk on Particle Guidance (joint work with
@xuyilun2
&
@ValentinDeBort1
)! 🎉 I'll also be sharing some exciting new results that haven't even made it to arXiv yet! 🔥 Join us!
You can find the updated manuscript on arXiv + we have released new code that integrates the predictions the ESM embedding generation and ESMFold structure prediction directly with DiffDock
From today, the majority of foreign university students cannot bring family members to the UK.
In 2024, we’re already delivering for the British people.
To all the tweeps that I met during my internship, thank you again for having been so welcoming and good luck for whatever happens in the future! Please reach out if I can be of any help!
We all know the reviewing process has its flows, but, in our case, some negative constructive reviews from ICLR really helped us improve the paper and get it accepted as an oral at ICML!
Our DGN paper was accepted as a long oral presentation (top 3%) at the ICML2021 conference of machine learning!! In this paper, we generalize the notion of directions beyond Euclidean spaces
@Saro2000
@vincentmillions
@GabriCorso
@pl219_Cambridge
Will H.
On Monday and Tuesday,
@HannesStaerk
will (somehow?!) present our new paper on 3D InfoMax at the AI4S, ML4PH, SSL and ML4Molecules workshops. Come to discover how we can leverage the large amounts of 3D molecular data available for pretraining!
Paper:
2/3
Super cool blog post about the universality of graphs and sets! The best resource that I have seen in all these years to understand the importance of considering continuity when studying expressiveness! A concept at the foundations of PNA that I hope will be further explored!
I have recently had a range of very insightful conversations with
@PetarV_93
about graph neural networks, networks on sets, universality and how ideas have spread in the two communities. This is our write up, feedback welcome as always! :)
➡️ ☕️
As many of you know I can be very opinionated about GNNs 🙂, there was definitely a filter, but I'm sure some considerations in the paper may be considered controversial ...
It was pretty fun to do this with
@GabriCorso
, Stefanie Jegelka, Tommi Jaakkola, and
@BarzilayRegina
.
As always,
@GabriCorso
came up with many interesting considerations, not all of which made it into the paper :)
2/2
Finally, we dedicate this work to our dear colleague, mentor, and friend Octavian Ganea (1987-2022) without whom this project would have never been possible. Thank you for everything Octavian, rest in peace.
Congratulations to
#JameelClinic
AI faculty lead
@BarzilayRegina
on being inducted into
@theNAEng
! This immense professional achievement is awarded to those who have distinguished themselves through extraordinary technical accomplishments & leadership.
How can we quickly and accurately predict the binding structure of a small molecule ligand to a protein?
Next week at M2D2,
@HannesStaerk
will show us how a generative modelling approach via diffusion is improving molecular docking.
Join us on Tuesday, Dec 13th at 11am ET!
Today we'll present "Principal Neighbourhood Aggregation for Graph Nets" during Session 6 of
#NeurIPS2020
! Come and chat with us in Town B1 Spot D1!
Poster 5-7pm GMT / 9-11am PT at
Paper:
I will be in Chicago for
#ACS
from tomorrow (Tuesday) to Thursday. The imposter syndrome of a machine learning scientist going to a chemistry conference has started to kick in, so if you are around and interested in chatting, please send me a message! 😉
It has been a pleasure to co-supervise
@Ahmed_AI035
Master thesis! Check out his great work on Graph Anisotropic Diffusion to discover how you can achieve efficient and global anisotropic kernels for message passing. Could this research direction be the future of Graph ML?
📢Very excited to share our work Graph Anisotropic Diffusion, which I will present today at two
#ICLR2022
workshops! (GTRL and MLDD) 🎉🥳🥳🥳
Extremely grateful to my advisor
@mmbronstein
and my collaborators
@GabriCorso
and
@HannesStaerk
🤗
Shocked. I'm relatively new to the field and probably naive. But here some suggestions to reduce this behavior: (1) very tough sanctions (people should be scared of losing their career) (2) system and protection for whistleblowers (should NOT be scared of losing their career)...
Just finished presenting our work on the "Principal Neighbourhood Aggregation" at the ELLIS Workshop on Geometric and Relational Deep Learning. Thank you to the organisers
@thomaskipf
and
@erikjbekkers
and to all the participants for the great insights!
A new annual conference at MIT organized in memory of Octavian
@octavianEganea
. It's open to the public and there will be great speakers and attendees from academia and industry!
We are excited to announce that our inaugural 2022 Molecular ML conference site is now live! 😃 Registrations & poster submissions are now open: (Students register for free! 🌟 See More Info+ @ registration)
@timrpeterson
Btw exciting updates on DiffDock are coming soon! Stay tuned!
In the meantime, you can freely access DiffDock and reach out to us if you are interested in collaborating!
Can we design pretraining tasks that truly work for molecular property prediction? We are not there yet I think, but this work makes a significant step in that direction!
It was a pleasure to have the chance to contribute to the supervision of Hannes's Master's thesis!
Camera-ready version is out with some new ablation studies on the performances on different types of graphs - data suggests that the biggest limitation to these models is now the message passing framework itself...
Paper:
Code:
And here is the preprint! We’ve added many, many more models and methods for protein engineering and analysis. Using TRILL, we’ve demonstrated workflows that have enabled us to generate putative cell penetrating peptides with only 4 lines of code
Very cool to see that
@PyG_Team
has decided to make "Aggregation a first-class principle" significantly simplifying the control of the aggregation functions. It is crazy to look back to how feedly it was when we worked on PNA in 2020 (one of the reasons we initially used DGL)!
Latest release PyG 2.1.0 is out!
The new functionality includes principled aggregations, link-level and temporal samplers, data pipe support, and many more features!
See here for the full release notes:
🎉Introducing Scientist Spotlight!🎉 In this series, we'll be chatting with trailblazers, visionaries, and pioneers in the biotech realm who are part of the
#Superbio
community.
we've got a great seminar coming on Wednesday next week!
@GabriCorso
,
@HannesStaerk
and Bowen Jing will share their latest work
"DiffDock: Diffusion Steps, Twists and Turns for Molecular Docking and Beyond!"
Seminar starts at 7pm EST
see for details
Wow, I'm going to Turkey this winter and I'm making my itinerary (and getting summaries of the different places) entirely with
#ChatGPT
! Science is amazing! 🤯
@emaros96
If you want to measure impact you must have a time lag otherwise you are just encouraging incremental improvement and measuring trendiness and publicity. Acceptance at conference/journal cannot be a measure of impact (it would impossible), but in case of the quality of submission
Really interesting work on efficient convolutions & aggregations in GNNs!
It discusses the poorly understood relationship of node-based vs edge-based aggregation. A few comments from my experience (1/N)
I am delighted to announce our new work "Adaptive Filters and Aggregator Fusion for Efficient Graph Convolutions" w/
@FelixOpo
@pl219_cambridge
@niclane7
We provide code + pretrained models; find our blog post here:
Are you always complaining about the high computational cost of SO(3)-equivariant networks?
📣Excited to present "Reducing SO(3) Convolutions to SO(2) for Efficient Equivariant GNNs" or eSCN (in short) at ICML!
Paper:
Code:
1/6
Just 4 days to submit the abstract for the new Learning on Graph
#LoG
conference (both short paper and long paper tracks with PMLR proceedings)! Submit to get high-quality feedback in the first conference with high reviewers monetary awards to encourage very high review quality!
Please spread the word that the
@LogConference
abstract submission deadline is on Sept 9th!
Submit your papers!
We ensure excellent reviewers for both the 9 page full paper track (in PMLR proceedings if accepted) and the 4 page extended abstract track 👌
Really great work from
@HannesStaerk
! Pretraining models for molecular property prediction is very important given the small datasets, but doing it in a principled way is very hard. Here we teach models to reason about molecular dynamics from large amounts of 3D conformers 🧪🧑🔬