Alex J Li @alex_j_li X Profile

Alex J Li

@alex_j_li

Followers

165

Following

7

Media

13

Statuses

22

MIT '22 | UCB-UCSF BioE PhD student @kortemmelab & Pinney Lab | Interested in geometric/graph ML and program synthesis in chemistry and biomolecular design

https://t.co/pXNhXBTC7A

Joined March 2022

Don't wanna be here? Send us removal request.

Alex J Li

@alex_j_li

23 days

Excited to finally release ProteinZen, an SE(3) equivariant all-atom protein generative model enabling unconditional all-atom monomer generation and SoTA all-atom motif scaffolding performance! Preprint here: https://t.co/TFhc8T2CXr 🧵1/n

biorxiv.org

The advent of generative models of protein structure has greatly accelerated our ability to perform de novo protein design, especially concerning design at coarser physical scales such as backbone...

3

17

112

Alex J Li

@alex_j_li

23 days

Huge shoutout to my advisor Tanja @KortemmeLab for supporting me throughout this work! Code is available at https://t.co/rSv2w9fEFx, please try it out and be on the lookout for future updates as we extend ProteinZen to more design tasks! 8/8

github.com

All-atom generative protein design using SE(3) flow matching - alexjli/proteinzen

0

1

Alex J Li

@alex_j_li

23 days

When redesigning the scaffold sequence with ProteinMPNN, we find that ProteinZen continues to maintain the highest unique success rate across the majority of tasks, including against RFDiffusion! 7/

1

0

1

Alex J Li

@alex_j_li

23 days

Furthermore, ProteinZen shines on motif scaffolding challenges! On one-shot motif scaffolding benchmarks, ProteinZen has the highest unique task success rate across the majority of tasks, including when performing either indexed and unindexed motif scaffolding. 6/

1

0

Alex J Li

@alex_j_li

23 days

ProteinZen performs competitively on unconditional monomer generation benchmarks, on par with previous methods such as Pallatom and outcompeted only by concurrent work La-Proteina. 5/

1

0

2

Alex J Li

@alex_j_li

23 days

We also develop an SDE integration strategy for sampling from ProteinZen which we find yields better performance and control over sampling than the previous ODE integration methods. 4/

1

0

1

Alex J Li

@alex_j_li

23 days

When we express proteins as collections of frames, we can use SE(3) flow matching to train a generative model over frames and generate all-atom protein structures! ProteinZen extends IPA Transformers with multi-scale modeling and is trained as a denoising frame predictor. 3/

1

0

1

Alex J Li

@alex_j_li

23 days

One of the challenges in developing all-atom methods is designing a tractable residue representation that is sufficiently expressive of atomic and sequence detail. In this work we propose representing residues as sets of oriented rigid chemical fragments (and in turn frames) 2/

1

0

2

Alex J Li

@alex_j_li

11 months

At NeurIPS this weekend presenting an MLSB poster on my current progress on ProteinZen, an all-atom protein structure generation method: find me there or DM me to chat about all things protein! Paper: https://t.co/Dgw5fkRDmD

2

12

77

Alex J Li

@alex_j_li

3 years

Code is available at https://t.co/23m9u6SCeL 12/12

github.com

Neurally-derived Potts models for protein design, inspired by dTERMen - alexjli/terminator_public

1

0

6

Alex J Li

@alex_j_li

3 years

I'd like to thank my advisors Amy and Gevorg for all their support and guidance, my co-authors Mindren Israel @vikramsundar for being fun collaborators with great ideas, and @SassSeabass for teaching me all things dTERMen. Without them, none of this would have been possible! 11/

1

0

3

Alex J Li

@alex_j_li

3 years

Lastly, we find a disconnect between NSR and other energy-based metrics: Potts param regularization improves NSR but not energetics predictions, suggesting future directions for energy-based objectives when training and evaluating new protein design models. 10/

1

0

3

Alex J Li

@alex_j_li

3 years

Our models can also be improved in a generalizable fashion via finetuning on experimental data. Finetuning on the Bcl-2 affinity dataset increases performance on Rocklin stability predictions, despite the Rocklin dataset being composed of de novo folds. 9/

1

0

3

Alex J Li

@alex_j_li

3 years

Additionally, despite not explicitly trained to predict energies, our models have good performance on energy-based tasks, including a Bcl-2 binding affinity dataset (Frappier et al. Structure 2019) and a de novo protein stability dataset (Rocklin et al. Science 2017). 8/

1

0

3

Alex J Li

@alex_j_li

3 years

To assess fold specificity, we show that designed sequences (B) tend to fold to their target structure as predicted by AlphaFold. The same trend is not observed when using a randomized-sequence baseline (D). 7/

1

0

1

Alex J Li

@alex_j_li

3 years

Via model ablations, we show that TERMs and the Potts model output both contribute to increasing NSR. In designed sequences, we also see physiochemically realistic substitutions occur when a non-native residue label is chosen. 6/

1

0

2

Alex J Li

@alex_j_li

3 years

We present two models: TERMinator, using both TERM and backbone coordinates as input, and COORDinator, using only coordinates. Both output protein-specific Potts models over sequence labels, which can be optimized for sequence design or used to predict energies of mutations. 5/

1

0

2

Alex J Li

@alex_j_li

3 years

In this work, we tackle these concerns by 1) using TERMs, small recurring local-in-space structural motifs, to implicitly model protein flexibility, and 2) predicting an energy landscape (Potts model) over sequence space as output rather than a direct sequence. 4/

1

0

3

Alex J Li

@alex_j_li

3 years

However, current models assume static backbone structures without allowing structural flexibility, and also directly design sequence on structure, which can be difficult to adapt to energy-based questions or sequence optimization under discrete constraints. 3/

1

4

Alex J Li

@alex_j_li

3 years

Neural nets have been taking protein design by storm, one of the most promising results being strong performance on native sequence recovery (NSR) tasks, where the task is to predict the native sequence of a protein given only its backbone structure. 2/

1

4