Aiden Kolodziej
@aidenosinetrip1
Followers
128
Following
352
Media
14
Statuses
70
Join us Wednesday December 10th for an amazing seminar by @ChoYehlin to cap off 2025. See you at 7pm EST in Room 6055, Longwood Center @DanaFarber "How AF3-Style Structure Prediction Models Can Be Used for Protein Design: BoltzDesign and Protein Hunter" https://t.co/E8bDyGzirb
bpdmc.org
introduction and membership Boston Protein Design and Modeling Club (BPDMC) is a community of computational protein engineers and modelers from both academia and industry. While we are based in...
0
8
18
Another example showing how a common SH3 fold may only match at the CATH "class" level. If you're splitting your train/test by topology...you've got data leakage 🚰
0
0
4
One example from CIRPIN ( https://t.co/SMO3AOX9xe) where two very similar structures have entirely different CATH classifications
🚨 For those training DL models on proteins, it's possible your "structural" train/test split might have leakage cus tools like foldseek/TMalign (and CATH/SCOP databases) do not always account for structural relationship of circularly permuted proteins:
1
2
29
@niopeklab We've also developed a Colab Notebook where you can try out CIRPIN for yourself! If you're looking for remote homologs that may be related by CP, insertions, extensions, rewirings, or other rearrangements, try it out and let us know how it worked! https://t.co/hiONSCJWE9
github.com
Source code for CIRPIN: Learning Circular Permutation-Invariant Representations to Uncover Putative Protein Homologs - aidenkoloj/CIRPIN
0
1
5
Excited to share this work with @yoakiyama @ChoYehlin @jajoosam @sokrypton We find that protein language models trained solely on individual protein sequences, implicitly learn the interface contacts of homo-oligomeric assemblies! As the model scales up, more interface signals
4
32
139
🚀Sergey Ovchinnikov's paper is here!! Can a deep learning model be trained to recognize proteins with identical 3D structures but different sequence connectivity, revealing thousands of hidden evolutionary relationships? "CIRPIN: Learning Circular Permutation-Invariant
0
26
123
@niopeklab If you're interested in circular permutation and remote homolog search, there's more to the story here:
biorxiv.org
Protein structure-based homology detection has been revolutionized by deep learning methods that can rapidly search massive databases. However, current structural search tools often miss proteins...
1
1
5
@niopeklab In addition to CPs, CIRPIN allowed us to uncover more complex topological rearrangements. We identified cases of rewiring, where the connectivity of secondary structures differs, as well as pairs of similar proteins obscured by insertions/extensions:
1
0
0
@niopeklab Investigating the node-level embeddings of CIRPIN/Progres revealed that CIRPIN captures tertiary motifs. CIRPIN identifies regions of similarity within proteins that Progres missed. 3cbn is an interesting example since the two halves of the protein are nearly identical:
1
0
0
@niopeklab Shifting to the model interp side of things: PCA of CIRPIN/Progres embeddings shows how CIRPIN groups CPs of the same structure together. This can be thought of as a "folding" of the embedding space:
1
0
0
@niopeklab There's a lot of interesting evolutionary q's to dive into here, saved for another thread. But it's worth highlighting how there's been decades of very detailed work on PDZ domains. How might knowledge of these four CPs change what we know about PDZ form and function?
1
0
0
@niopeklab We then used CIRPIN/Progres to investigate the PDZ topology in the AFDB cluster representatives and found that PDZs exist in 4 circularly permuted forms:
1
0
0
Among the pairs we discovered in SCOP, were PDZ domains, recently reported to be the most frequently inserted domains in the AFDB by the @niopeklab
1
0
1
We could then use a contrastive approach to identify novel CPs by searching for pairs with high CIRPIN scores and low Progres scores
1
0
0
Training with synCPs allows our model, CIRPIN, to find similarity between known cases of circular permutants which Progres previously failed to identify:
1
0
0
Think of how comma placement in English can drastically change the meaning of a sentence. A panda can be a cute docile animal or a serial killer, depending on your inclusion of a comma.
1
0
3
Under the hood, synCP generation is simply shifting the positional information of each structure. But the consequences are significant.
1
0
0
What happens during training is that random synCPs of input structures are generated, forcing the model to learn that structures related by CP are similar:
2
0
0
Building off the Progres model by @jgreener64 @jamaliki1998, we introduced a novel data augmentation strategy using synthetic circular permutations (synCPs)
1
0
1
We wondered if there was a way to leverage the speed of deep learning based search tools, with the sensitivity of traditional structural alignment.
1
0
0