Alex J Li Profile
Alex J Li

@alex_j_li

Followers
165
Following
7
Media
13
Statuses
22

MIT '22 | UCB-UCSF BioE PhD student @kortemmelab & Pinney Lab | Interested in geometric/graph ML and program synthesis in chemistry and biomolecular design

Joined March 2022
Don't wanna be here? Send us removal request.
@alex_j_li
Alex J Li
23 days
Excited to finally release ProteinZen, an SE(3) equivariant all-atom protein generative model enabling unconditional all-atom monomer generation and SoTA all-atom motif scaffolding performance! Preprint here: https://t.co/TFhc8T2CXr 🧵1/n
Tweet card summary image
biorxiv.org
The advent of generative models of protein structure has greatly accelerated our ability to perform de novo protein design, especially concerning design at coarser physical scales such as backbone...
3
17
112
@alex_j_li
Alex J Li
23 days
Huge shoutout to my advisor Tanja @KortemmeLab for supporting me throughout this work! Code is available at https://t.co/rSv2w9fEFx, please try it out and be on the lookout for future updates as we extend ProteinZen to more design tasks! 8/8
Tweet card summary image
github.com
All-atom generative protein design using SE(3) flow matching - alexjli/proteinzen
0
1
1
@alex_j_li
Alex J Li
23 days
When redesigning the scaffold sequence with ProteinMPNN, we find that ProteinZen continues to maintain the highest unique success rate across the majority of tasks, including against RFDiffusion! 7/
1
0
1
@alex_j_li
Alex J Li
23 days
Furthermore, ProteinZen shines on motif scaffolding challenges! On one-shot motif scaffolding benchmarks, ProteinZen has the highest unique task success rate across the majority of tasks, including when performing either indexed and unindexed motif scaffolding. 6/
1
0
0
@alex_j_li
Alex J Li
23 days
ProteinZen performs competitively on unconditional monomer generation benchmarks, on par with previous methods such as Pallatom and outcompeted only by concurrent work La-Proteina. 5/
1
0
2
@alex_j_li
Alex J Li
23 days
We also develop an SDE integration strategy for sampling from ProteinZen which we find yields better performance and control over sampling than the previous ODE integration methods. 4/
1
0
1
@alex_j_li
Alex J Li
23 days
When we express proteins as collections of frames, we can use SE(3) flow matching to train a generative model over frames and generate all-atom protein structures! ProteinZen extends IPA Transformers with multi-scale modeling and is trained as a denoising frame predictor. 3/
1
0
1
@alex_j_li
Alex J Li
23 days
One of the challenges in developing all-atom methods is designing a tractable residue representation that is sufficiently expressive of atomic and sequence detail. In this work we propose representing residues as sets of oriented rigid chemical fragments (and in turn frames) 2/
1
0
2
@alex_j_li
Alex J Li
11 months
At NeurIPS this weekend presenting an MLSB poster on my current progress on ProteinZen, an all-atom protein structure generation method: find me there or DM me to chat about all things protein! Paper: https://t.co/Dgw5fkRDmD
2
12
77
@alex_j_li
Alex J Li
3 years
I'd like to thank my advisors Amy and Gevorg for all their support and guidance, my co-authors Mindren Israel @vikramsundar for being fun collaborators with great ideas, and @SassSeabass for teaching me all things dTERMen. Without them, none of this would have been possible! 11/
1
0
3
@alex_j_li
Alex J Li
3 years
Lastly, we find a disconnect between NSR and other energy-based metrics: Potts param regularization improves NSR but not energetics predictions, suggesting future directions for energy-based objectives when training and evaluating new protein design models. 10/
1
0
3
@alex_j_li
Alex J Li
3 years
Our models can also be improved in a generalizable fashion via finetuning on experimental data. Finetuning on the Bcl-2 affinity dataset increases performance on Rocklin stability predictions, despite the Rocklin dataset being composed of de novo folds. 9/
1
0
3
@alex_j_li
Alex J Li
3 years
Additionally, despite not explicitly trained to predict energies, our models have good performance on energy-based tasks, including a Bcl-2 binding affinity dataset (Frappier et al. Structure 2019) and a de novo protein stability dataset (Rocklin et al. Science 2017). 8/
1
0
3
@alex_j_li
Alex J Li
3 years
To assess fold specificity, we show that designed sequences (B) tend to fold to their target structure as predicted by AlphaFold. The same trend is not observed when using a randomized-sequence baseline (D). 7/
1
0
1
@alex_j_li
Alex J Li
3 years
Via model ablations, we show that TERMs and the Potts model output both contribute to increasing NSR. In designed sequences, we also see physiochemically realistic substitutions occur when a non-native residue label is chosen. 6/
1
0
2
@alex_j_li
Alex J Li
3 years
We present two models: TERMinator, using both TERM and backbone coordinates as input, and COORDinator, using only coordinates. Both output protein-specific Potts models over sequence labels, which can be optimized for sequence design or used to predict energies of mutations. 5/
1
0
2
@alex_j_li
Alex J Li
3 years
In this work, we tackle these concerns by 1) using TERMs, small recurring local-in-space structural motifs, to implicitly model protein flexibility, and 2) predicting an energy landscape (Potts model) over sequence space as output rather than a direct sequence. 4/
1
0
3
@alex_j_li
Alex J Li
3 years
However, current models assume static backbone structures without allowing structural flexibility, and also directly design sequence on structure, which can be difficult to adapt to energy-based questions or sequence optimization under discrete constraints. 3/
1
1
4
@alex_j_li
Alex J Li
3 years
Neural nets have been taking protein design by storm, one of the most promising results being strong performance on native sequence recovery (NSR) tasks, where the task is to predict the native sequence of a protein given only its backbone structure. 2/
1
1
4