
Alex J Li
@alex_j_li
Followers
128
Following
6
Media
7
Statuses
14
MIT '22 | UCB-UCSF BioE PhD student @kortemmelab & Pinney Lab | Interested in geometric/graph ML and program synthesis in chemistry and biomolecular design
Joined March 2022
First twitter thread🧵and also my first BioRxiv preprint! I’m excited to finally release my undergrad work into the world: combining GNNs, Potts models, and Tertiary Motifs (TERMs) for protein design! See the preprint here: https://t.co/aiVoRA8g6S 1/
2
14
68
At NeurIPS this weekend presenting an MLSB poster on my current progress on ProteinZen, an all-atom protein structure generation method: find me there or DM me to chat about all things protein! Paper: https://t.co/Dgw5fkRDmD
2
12
77
Code is available at https://t.co/23m9u6SCeL 12/12
github.com
Neurally-derived Potts models for protein design, inspired by dTERMen - alexjli/terminator_public
1
0
6
I'd like to thank my advisors Amy and Gevorg for all their support and guidance, my co-authors Mindren Israel @vikramsundar for being fun collaborators with great ideas, and @SassSeabass for teaching me all things dTERMen. Without them, none of this would have been possible! 11/
1
0
3
Lastly, we find a disconnect between NSR and other energy-based metrics: Potts param regularization improves NSR but not energetics predictions, suggesting future directions for energy-based objectives when training and evaluating new protein design models. 10/
1
0
3
Our models can also be improved in a generalizable fashion via finetuning on experimental data. Finetuning on the Bcl-2 affinity dataset increases performance on Rocklin stability predictions, despite the Rocklin dataset being composed of de novo folds. 9/
1
0
3
Additionally, despite not explicitly trained to predict energies, our models have good performance on energy-based tasks, including a Bcl-2 binding affinity dataset (Frappier et al. Structure 2019) and a de novo protein stability dataset (Rocklin et al. Science 2017). 8/
1
0
3
To assess fold specificity, we show that designed sequences (B) tend to fold to their target structure as predicted by AlphaFold. The same trend is not observed when using a randomized-sequence baseline (D). 7/
1
0
1
Via model ablations, we show that TERMs and the Potts model output both contribute to increasing NSR. In designed sequences, we also see physiochemically realistic substitutions occur when a non-native residue label is chosen. 6/
1
0
2
We present two models: TERMinator, using both TERM and backbone coordinates as input, and COORDinator, using only coordinates. Both output protein-specific Potts models over sequence labels, which can be optimized for sequence design or used to predict energies of mutations. 5/
1
0
2
In this work, we tackle these concerns by 1) using TERMs, small recurring local-in-space structural motifs, to implicitly model protein flexibility, and 2) predicting an energy landscape (Potts model) over sequence space as output rather than a direct sequence. 4/
1
0
3
However, current models assume static backbone structures without allowing structural flexibility, and also directly design sequence on structure, which can be difficult to adapt to energy-based questions or sequence optimization under discrete constraints. 3/
1
1
4
Neural nets have been taking protein design by storm, one of the most promising results being strong performance on native sequence recovery (NSR) tasks, where the task is to predict the native sequence of a protein given only its backbone structure. 2/
1
1
4