Alex J Li
@alex_j_li
Followers
165
Following
7
Media
13
Statuses
22
MIT '22 | UCB-UCSF BioE PhD student @kortemmelab & Pinney Lab | Interested in geometric/graph ML and program synthesis in chemistry and biomolecular design
Joined March 2022
Excited to finally release ProteinZen, an SE(3) equivariant all-atom protein generative model enabling unconditional all-atom monomer generation and SoTA all-atom motif scaffolding performance! Preprint here: https://t.co/TFhc8T2CXr 🧵1/n
biorxiv.org
The advent of generative models of protein structure has greatly accelerated our ability to perform de novo protein design, especially concerning design at coarser physical scales such as backbone...
3
17
112
Huge shoutout to my advisor Tanja @KortemmeLab for supporting me throughout this work! Code is available at https://t.co/rSv2w9fEFx, please try it out and be on the lookout for future updates as we extend ProteinZen to more design tasks! 8/8
github.com
All-atom generative protein design using SE(3) flow matching - alexjli/proteinzen
0
1
1
When redesigning the scaffold sequence with ProteinMPNN, we find that ProteinZen continues to maintain the highest unique success rate across the majority of tasks, including against RFDiffusion! 7/
1
0
1
Furthermore, ProteinZen shines on motif scaffolding challenges! On one-shot motif scaffolding benchmarks, ProteinZen has the highest unique task success rate across the majority of tasks, including when performing either indexed and unindexed motif scaffolding. 6/
1
0
0
ProteinZen performs competitively on unconditional monomer generation benchmarks, on par with previous methods such as Pallatom and outcompeted only by concurrent work La-Proteina. 5/
1
0
2
We also develop an SDE integration strategy for sampling from ProteinZen which we find yields better performance and control over sampling than the previous ODE integration methods. 4/
1
0
1
When we express proteins as collections of frames, we can use SE(3) flow matching to train a generative model over frames and generate all-atom protein structures! ProteinZen extends IPA Transformers with multi-scale modeling and is trained as a denoising frame predictor. 3/
1
0
1
One of the challenges in developing all-atom methods is designing a tractable residue representation that is sufficiently expressive of atomic and sequence detail. In this work we propose representing residues as sets of oriented rigid chemical fragments (and in turn frames) 2/
1
0
2
At NeurIPS this weekend presenting an MLSB poster on my current progress on ProteinZen, an all-atom protein structure generation method: find me there or DM me to chat about all things protein! Paper: https://t.co/Dgw5fkRDmD
2
12
77
Code is available at https://t.co/23m9u6SCeL 12/12
github.com
Neurally-derived Potts models for protein design, inspired by dTERMen - alexjli/terminator_public
1
0
6
I'd like to thank my advisors Amy and Gevorg for all their support and guidance, my co-authors Mindren Israel @vikramsundar for being fun collaborators with great ideas, and @SassSeabass for teaching me all things dTERMen. Without them, none of this would have been possible! 11/
1
0
3
Lastly, we find a disconnect between NSR and other energy-based metrics: Potts param regularization improves NSR but not energetics predictions, suggesting future directions for energy-based objectives when training and evaluating new protein design models. 10/
1
0
3
Our models can also be improved in a generalizable fashion via finetuning on experimental data. Finetuning on the Bcl-2 affinity dataset increases performance on Rocklin stability predictions, despite the Rocklin dataset being composed of de novo folds. 9/
1
0
3
Additionally, despite not explicitly trained to predict energies, our models have good performance on energy-based tasks, including a Bcl-2 binding affinity dataset (Frappier et al. Structure 2019) and a de novo protein stability dataset (Rocklin et al. Science 2017). 8/
1
0
3
To assess fold specificity, we show that designed sequences (B) tend to fold to their target structure as predicted by AlphaFold. The same trend is not observed when using a randomized-sequence baseline (D). 7/
1
0
1
Via model ablations, we show that TERMs and the Potts model output both contribute to increasing NSR. In designed sequences, we also see physiochemically realistic substitutions occur when a non-native residue label is chosen. 6/
1
0
2
We present two models: TERMinator, using both TERM and backbone coordinates as input, and COORDinator, using only coordinates. Both output protein-specific Potts models over sequence labels, which can be optimized for sequence design or used to predict energies of mutations. 5/
1
0
2
In this work, we tackle these concerns by 1) using TERMs, small recurring local-in-space structural motifs, to implicitly model protein flexibility, and 2) predicting an energy landscape (Potts model) over sequence space as output rather than a direct sequence. 4/
1
0
3
However, current models assume static backbone structures without allowing structural flexibility, and also directly design sequence on structure, which can be difficult to adapt to energy-based questions or sequence optimization under discrete constraints. 3/
1
1
4
Neural nets have been taking protein design by storm, one of the most promising results being strong performance on native sequence recovery (NSR) tasks, where the task is to predict the native sequence of a protein given only its backbone structure. 2/
1
1
4