amelie_schreiber @amelie_iska profile

amelie_schreiber

@amelie_iska

Followers

1,020

Following

343

Media

27

Statuses

568

I ❤️ proteins! Researching protein language models, equivariant transformers, LoRA, QLoRA, DDPMs, flow matching, etc. intersex=awesome😎✡️🏳️‍🌈🏳️‍⚧️💻🧬❤️🇮🇱

https://t.co/UCD9pnWj1b

California

Joined May 2023

Don't wanna be here? Send us removal request.

Explore tweets Explore followers Explore following

Explore trending content on Musk Viewer

Ayuso • 184593 Tweets

No Riley • 142264 Tweets

Sean • 94056 Tweets

#NEDFRA • 83451 Tweets

Jacob Juma • 82697 Tweets

Austria • 78694 Tweets

Francia • 65112 Tweets

#LondonTSTheErastour • 60223 Tweets

Olise • 54646 Tweets

Bayern • 49058 Tweets

Griezmann • 39453 Tweets

Kanté • 37588 Tweets

Rabiot • 37243 Tweets

オランダ • 30546 Tweets

فرنسا • 30288 Tweets

Thuram • 29193 Tweets

Holanda • 26050 Tweets

Dembele • 22724 Tweets

Clancy • 22528 Tweets

Deschamps • 21079 Tweets

The Dutch • 18313 Tweets

DIAN • 15037 Tweets

Barcola • 14815 Tweets

THE BLACK DOG • 13954 Tweets

HITS DIFFERENT • 13886 Tweets

#PBSFRA • 13291 Tweets

Depay • 13145 Tweets

كانتي • 13067 Tweets

Xavi Simons • 10298 Tweets

Maroon • 10282 Tweets

シャビシモンズ

كومان

COME BACK BE HERE

Camavinga

ديشامب

Coman

ديباي

English VAR

Stuart Atwell

Koeman

Giroud

グリーズマン

オフサイド

غريزمان

#سلمان_بن_ملهي

Dumfries

Maignan

Anthony Taylor

Netherlands

#ها_احل

Last Seen Profiles

@tokelo_o

@backstreet2001

@RACE_es

@luvpaws

@Meatgrinder_mk1

@wonyobf

@HowcroftTyler

@KevinQ

@aguapel

@GodClaimed

@carlozz_car

@Zatannai

@DeeperDeeper_

@LFootfun

@syaronel

@com_in_

@alexlolz1337

@CreeAnt

@olgun_sever33

@TURKPORN6

amelie_schreiber

@amelie_iska

3 months

Top 10 ❤️ tools rn, in no particular order: 1. ProteinDT 2. MoleculesSTM 3. RFDiffusion-AA 4. RosettaFold-AA 5. LigandMPNN 6. Distributional Graphormer (DiG) 7. DNA-Diffusion 8. OAReactDiff 9. RFDiffusion (original) 10. EvoDiff ❤️ Evo ❤️ Flow matching ❤️ Boltzmann generators

2

27

196

amelie_schreiber

@amelie_iska

3 months

Protein binding a small molecule designed with RFDiffusion-AA yesterday. I'm such a huge fangirl for these all-atom models. Baker Lab is awesome!

2

17

170

amelie_schreiber

@amelie_iska

4 months

These two together make a really good pair: From this you get conformational ensembles and binding affinity for protein-protein, protein-small molecule, and protein-nucleic acid affinities, reducing the need for expensive MD sims.

GitHub - bjing2016/alphaflow: AlphaFold Meets Flow Matching for Generating Protein Ensembles

AlphaFold Meets Flow Matching for Generating Protein Ensembles - bjing2016/alphaflow

github.com

0

24

138

amelie_schreiber

@amelie_iska

3 months

Found out yesterday some of my @huggingface blogs inspired some undergrads to start studying AI applied to proteins and someone applied to and received an internship based on their interest in replicating and extending some of them. 😎 Feeling very inspired and grateful now. ❤️

4

8

130

amelie_schreiber

@amelie_iska

2 months

Hope this helps 😊

A Guide to Designing New Functional Proteins and Improving Protein Function, Stability, and...

huggingface.co

2

33

132

amelie_schreiber

@amelie_iska

3 months

In case it is helpful:

RFDiffusion Potentials

huggingface.co

1

13

94

amelie_schreiber

@amelie_iska

1 month

This is such a good presentation @sokrypton

Patrick Bryant and Sergey Ovchinnikov - CASP AI

Tricking AlphaFold to improve predictions by backpropagation

www.youtube.com

2

14

74

amelie_schreiber

@amelie_iska

9 months

Just thought I would share this new Hugging Face community blog post I wrote as a follow up post to the ESMBind post. It explains how to build an ensemble of Low Rank Adaptations (LoRAs) after you have finetuned multiple ESMBind LoRA models:

ESMBind (ESMB) Ensemble Models

huggingface.co

1

13

57

amelie_schreiber

@amelie_iska

3 months

This just happened.

Atomically accurate de novo design of single-domain antibodies

Despite the central role that antibodies play in modern medicine, there is currently no way to rationally design novel antibodies to bind a specific epitope on a target. Instead, antibody discovery...

www.biorxiv.org

1

6

56

amelie_schreiber

@amelie_iska

3 months

RNA sequence design analogous to ProteinMPNN, but for RNA:

GitHub - chaitjo/geometric-rna-design: gRNAde: Geometric Deep Learning for 3D RNA inverse design

gRNAde: Geometric Deep Learning for 3D RNA inverse design - chaitjo/geometric-rna-design

github.com

1

6

55

amelie_schreiber

@amelie_iska

3 months

An interesting and novel approach to applying transformers to graph structured data. This never got the attention it deserved and is likely an approach lost to time. It maybe “old” but it’s worth investigating further, especially for biochem/molecules:

GitHub - jw9730/tokengt: [NeurIPS'22] Tokenized Graph Transformer (TokenGT), in PyTorch

[NeurIPS'22] Tokenized Graph Transformer (TokenGT), in PyTorch - jw9730/tokengt

github.com

3

8

55

amelie_schreiber

@amelie_iska

3 months

Damn, another E(3)-equivariant model that should have been SE(3)-equivariant. Molecules have chirality! Still exciting that it works for small molecules AND proteins:

Equivariant Pretrained Transformer for Unified Geometric Learning...

Pretraining on a large number of unlabeled 3D molecules has showcased superiority in various scientific applications. However, prior efforts typically focus on pretraining models on a specific...

arxiv.org

0

7

55

amelie_schreiber

@amelie_iska

3 months

Has anyone else tried grafting two proteins together by first placing the proteins into AlphaFold-Multimer, then linking the proteins together with something like RFDiffusion motif scaffolding (treating the two proteins as though they are in the same chain)?

3

5

51

amelie_schreiber

@amelie_iska

3 months

Equivariant Spatio-Temporal Attentive Graph Networks to Simulate Physical Dynamics: A Replacement for MD? TBD. More comments to come. OpenReview: GitHub:

GitHub - ManlioWu/ESTAG: The source code of "Equivariant Spatio-Temporal Attentive Graph Networks...

The source code of "Equivariant Spatio-Temporal Attentive Graph Networks to Simulate Physical Dynamics" - ManlioWu/ESTAG

github.com

4

9

46

amelie_schreiber

@amelie_iska

11 days

BindGPT sounds pretty cool! No code though 😒 probably because they’re on to something with this one, especially when considering the performance and the inference cost drop together. High throughput is really needed for this problem.

BindGPT: A Scalable Framework for 3D Molecular Design via Language...

Generating novel active molecules for a given protein is an extremely challenging task for generative models that requires an understanding of the complex physical interactions between the...

arxiv.org

3

2

49

amelie_schreiber

@amelie_iska

7 months

ESM-2 for Generating and Optimizing Peptide Binders for Target Proteins

huggingface.co

1

12

42

amelie_schreiber

@amelie_iska

8 months

Recently wrote a new blog post on intrinsic dimension of protein language model embeddings and curriculum learning:

Estimating the Intrinsic Dimension of Protein Sequence Embeddings using ESM-2

huggingface.co

1

7

41

amelie_schreiber

@amelie_iska

7 months

Working on a new method to cluster protein-protein complexes so I can finetune ESM-2 on them for predicting PPIs and for generating binders 😊. Also may try to finetune EvoDiff this way for generating binders. I ❤️ proteins so much.

2

1

40

amelie_schreiber

@amelie_iska

3 months

Here’s a new method for sampling the equilibrium Boltzmann distribution for proteins using GFlowNets: If you aren’t familiar with GFlowNets, head over to @edwardjhu ’s twitter and watch his video. I’ll also post a link to a related lecture soon.

GitHub - GFNOrg/conf-gfn

Contribute to GFNOrg/conf-gfn development by creating an account on GitHub.

github.com

3

4

40

amelie_schreiber

@amelie_iska

7 months

Just cooked up a new tokenization method for protein language models and large language models. I can't wait to share :)

1

40

amelie_schreiber

@amelie_iska

2 months

Not specifically for proteins or other molecules, but this is a nice intro to flow matching. Thanks for the video @ykilcher any chance you’d ever do something on this applied to proteins?

Flow Matching for Generative Modeling (Paper Explained)

Flow matching is a more general method than diffusion and serves as the basis for models like Stable Diffusion 3.Paper: https://arxiv.org/abs/2210.02747Abstr...

www.youtube.com

0

8

38

amelie_schreiber

@amelie_iska

7 months

QLoRA for ESM-2 and Post Translational Modification Site Prediction

huggingface.co

0

9

34

amelie_schreiber

@amelie_iska

7 months

Clustering Protein Complexes using Persistent Homology and Finetuning ESM-2 for PPI Network...

huggingface.co

3

5

35

amelie_schreiber

@amelie_iska

7 months

In Silico Directed Evolution of Protein Sequences with ESM-2 and EvoProtGrad

huggingface.co

0

10

31

amelie_schreiber

@amelie_iska

4 months

This looks pretty cool! Also helpful for cutting down on expensive MD simulations. Can't believe I'm just now noticing this work.

Towards Predicting Equilibrium Distributions for Molecular Systems...

Valence Labs is a research engine within Recursion committed to advancing the frontier of AI in drug discovery. Learn more about our open roles: https://www....

www.youtube.com

1

6

29

amelie_schreiber

@amelie_iska

1 month

Whenever an open source version of #AlphaFold 3 is being created, be sure to try swapping out the diffusion module for a flow matching module. It’ll probably turn out better that way 😉

2

4

29

amelie_schreiber

@amelie_iska

20 days

ESM-AA huh? Would it be better to use random order autoregressive decoding (similar to LigandMPNN for example) instead of MLM? It seems like a harder objective to train on, but you could end up with a better performing model.

3

28

amelie_schreiber

@amelie_iska

4 months

New updates, now we have BioT5+ I'm excited to try this model out.

GitHub - QizhiPei/BioT5: BioT5: Enriching Cross-modal Integration in Biology with Chemical Knowle...

BioT5: Enriching Cross-modal Integration in Biology with Chemical Knowledge and Natural Language Associations (EMNLP 2023) - QizhiPei/BioT5

github.com

0

5

21

amelie_schreiber

@amelie_iska

2 months

Let’s go! “CRISPR-GPT leverages the reasoning ability of LLMs to facilitate the process of selecting CRISPR systems, designing guide RNAs, recommending cellular delivery methods, drafting protocols, and designing validation experiments to confirm editing outcomes.”

bioRxiv Bioinfo

@biorxiv_bioinfo

2 months

CRISPR-GPT: An LLM Agent for Automated Design of Gene-Editing Experiments #biorxiv_bioinfo

0

13

34

2

18

amelie_schreiber

@amelie_iska

6 months

Shouldn't we be able to do something similar to this with LoRA? LoRA and SVD are conceptually very similar. If so, that would likely explain the results in this paper where LoRA turns out to be better than full finetuning Thoughts?

Democratizing Protein Language Models with Parameter-Efficient Fine-Tuning

Proteomics has been revolutionized by large pre-trained protein language models, which learn unsupervised representations from large corpora of sequences. The parameters of these models are then...

www.biorxiv.org

0

1

18

amelie_schreiber

@amelie_iska

4 months

Apparently you can in fact do flow matching on discrete data, for those interested in diffusion applied to discrete data like language and NLP, this is a good reference for how to do it with the more general flow matching models:

Jason Yim

@json_yim

4 months

Combining discrete and continuous data is an important capability for generative models. To address this for protein design, we introduce Multiflow, a generative model for structure and sequence generation. Preprint: Code: 1/8

2

91

445

0

1

17

amelie_schreiber

@amelie_iska

9 days

Okay, hear me out…stratify an NLP dataset (or any other modality really) by using a “homology search” using a BERT style model, similar to this paper, but for non-protein data: Could help determine the amount of generalization, no?

PLMSearch: Protein language model powers accurate and fast sequence search for remote homology

Nature Communications - Homologous protein search is one of the most commonly used methods for protein analysis. Here, authors propose PLMSearch, a search method that takes only sequences as input...

www.nature.com

2

0

15

amelie_schreiber

@amelie_iska

1 month

😎🧬

1

0

15

amelie_schreiber

@amelie_iska

3 months

Another E(3)-equivariant model that should be SE(3)-equivariant. E(3) doesn’t preserve chirality of molecules. GitHub:

GitHub - mir-group/allegro: Allegro is an open-source code for building highly scalable and...

Allegro is an open-source code for building highly scalable and accurate equivariant deep learning interatomic potentials - mir-group/allegro

github.com

Machine Learning in Chemistry

@ML_Chem

3 months

Transferable Water Potentials Using Equivariant Neural Networks #machinelearning #compchem

0

3

35

0

1

14

amelie_schreiber

@amelie_iska

4 months

Enhanced Protein-Protein Interaction Discovery via AlphaFold-Multimer

Accurately mapping protein-protein interactions (PPIs) is critical for elucidating cellular functions and has significant implications for health and disease. Conventional experimental approaches,...

www.biorxiv.org

0

1

14

amelie_schreiber

@amelie_iska

2 months

C_4 symmetric motif scaffolding with RFDiffusion.

0

14

amelie_schreiber

@amelie_iska

16 days

Love child from Distributional Graphormer ( #DiG ) and #alphafold3 when? C’mon @Microsoft and @GoogleDeepMind . If @OpenAI and @Apple can team up to deliver #Her we can also have a new model that does dynamics for complexes of biomolecules. You’re almost there 🔥🔥🔥you got this.

3

2

13

amelie_schreiber

@amelie_iska

2 months

This looks pretty amazing:

0

3

13

amelie_schreiber

@amelie_iska

7 months

So nervous about this one.

0

2

13

amelie_schreiber

@amelie_iska

23 days

Are Kolmogorov-Arnold Networks (KAN) enough to address some problems in biochemistry that suffer from data scarcity? Apparently they require much less data to converge, and all they’re really doing is making activation functions trainable using B-splines.

1

2

11

amelie_schreiber

@amelie_iska

27 days

Thank you for inviting me and for the wonderful conversation.

Nathan Labenz

@labenz

27 days

The AI Revolution in Biology is here - it's just not evenly distributed, even among biologists @amelie_iska previews biology as an experimental information science, on the latest Cognitive Revolution – out now! Listen to catch up! (link in thread)

2

4

22

2

11

amelie_schreiber

@amelie_iska

3 months

Interestingly, quantizing state space models like Mamba doesn't seem to work very well, whereas we are now in the era of 1-bit quantization for transformers ~without~ performance degradation; it also isn't clear if Mamba is as expressive as Transformers.

The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits

Recent research, such as BitNet, is paving the way for a new era of 1-bit Large Language Models (LLMs). In this work, we introduce a 1-bit LLM variant, namely BitNet b1.58, in which every single...

arxiv.org

2

11

amelie_schreiber

@amelie_iska

7 months

If you have opportunities to work at the intersection of AI and proteins, DM me. I have ideas and I like implementing them :)

2

5

11

amelie_schreiber

@amelie_iska

10 days

RFDiffusion could’ve been used and it’s suggested this would improve outcomes. Why wasn’t it used for this problem? I’m curious. This would establish more use cases for RFDiffusion and similar methods (like flow matching with FoldFlow-2 for example).

Thomas Schlichthaerle

@schlichthaerle

11 days

Now online: We developed novel oligomers and turned them into FGFR agonists via binder induced receptor clustering. #denovo_proteins

3

29

109

1

11

amelie_schreiber

@amelie_iska

3 months

(1/n) Even if Sora isn't currently capable of accurately generating simulations of small molecules or proteins, open sourcing it or giving select researcher access to it would allow us to add in equivariance or use components of it such as those that maintain temporal coherence.

4

0

10

amelie_schreiber

@amelie_iska

3 months

Okay, serious question. If you can accomplish the same thing with more general proteins, why restrict yourself to antibodies? Also, what are some problems that really truly require antibodies specifically and that can’t be done with more general proteins?

5

0

10

amelie_schreiber

@amelie_iska

3 months

Seems like an interesting method. I find it very interesting that it works better (SOTA?) if you give it conformational ensembles to work with. Could be very interesting to see how conformational sampling, Distributional Graphormer, or AlphaFlow might yield better results.

Javier Sánchez Utgés

@JavierUtges

3 months

Having a lot of fun visualising the ligand binding site predictions of #IFSitePred with #PyMol ! A new ligand binding site prediction method that uses #ESMIF1 learnt representations to predict where ligands bind! Check it out here: #Q96BI1

0

2

20

2

1

10

amelie_schreiber

@amelie_iska

6 months

Predicting the Effects of Mutations on Protein Function with ESM-2

huggingface.co

0

2

10

amelie_schreiber

@amelie_iska

3 months

@TonyTheLion2500 I highly recommend this reference along with his “smooth manifolds” book: Introduction to Riemannian Manifolds (Graduate Texts in Mathematics)

Introduction to Riemannian Manifolds (Graduate Texts in Mathematics)

www.amazon.com

0

10

amelie_schreiber

@amelie_iska

14 days

Curious to know where this goes.

Essential and virulence-related protein interactions of pathogens revealed through deep learning

Identification of bacterial protein–protein interactions and predicting the structures of the complexes could aid in the understanding of pathogenicity mechanisms and developing treatments for...

www.biorxiv.org

0

3

9

amelie_schreiber

@amelie_iska

4 months

This looks pretty interesting! Text guided diffusion model for molecules.

GitHub - Deno-V/tgm-dlm: Code for AAAI24 paper Text-Guided Molecule Generation with Diffusion...

Code for AAAI24 paper Text-Guided Molecule Generation with Diffusion Language Model - GitHub - Deno-V/tgm-dlm: Code for AAAI24 paper Text-Guided Molecule Generation with Diffusion Language Model

github.com

0

9

amelie_schreiber

@amelie_iska

2 months

@olexandr Why not use PAE from RoseTTAFold All Atom to compute the LIS score similar to this:

GitHub - flyark/AFM-LIS: Local Interaction Score (LIS) Calculation from AlphaFold-Multimer (Enhan...

Local Interaction Score (LIS) Calculation from AlphaFold-Multimer (Enhanced Protein-Protein Interaction Discovery via AlphaFold-Multimer) - flyark/AFM-LIS

github.com

0

1

9

amelie_schreiber

@amelie_iska

18 days

Key insight from recent events…patent the method not the molecule. Some of these AI methods are going to wreck patents imo. 🤫 That said… MIT license >> patent (for humanity…usually).

2

0

8

amelie_schreiber

@amelie_iska

18 days

@SimonDBarnett Code is linked to in the Nature paper. This AI model actually samples the Boltzmann distribution, giving all the metastable states (low energy conformations) as well as the transition pathways between them. It’s a “generative diffusion model”:

0

1

8

amelie_schreiber

@amelie_iska

10 days

Goal: use partial diffusion and motif scaffolding to engineer a new version of nitrogenase then modify plant genetics to produce this new version so that chemical fertilizer is unnecessary.

The Most Underrated Chemical Process on Earth｜Nitrogen Fixing

Support the channel by joining our newsletter: https://bit.ly/watchclockworkWhy aren't more people talking about how cool Nitrogen is? Almost every single co...

www.youtube.com

0

8

amelie_schreiber

@amelie_iska

1 month

@biorxiv_bioinfo The code:

GitHub - wukevin/proteinclip: Contrastive learning harmonizing protein language models and natural...

Contrastive learning harmonizing protein language models and natural language models - wukevin/proteinclip

github.com

0

4

7

amelie_schreiber

@amelie_iska

3 months

To all those just getting into this stuff: You’re entering one of the most interesting and impactful areas at the most exciting time. Don’t give up, even when it feels impossible. Stay close to the open source biochem AI community. They’re a great crowd. Good luck and have fun!

1

0

7

amelie_schreiber

@amelie_iska

3 months

Having solid temporal coherence, or modifying the architecture to be SE(3)-equivariant would allow us to create better versions of things like this: and we might actually be able to replace MD with AI, speeding up drug discovery and solving major problems

Equivariant Spatio-Temporal Attentive Graph Networks to Simulate...

Learning to represent and simulate the dynamics of physical systems is a crucial yet challenging task. Existing equivariant Graph Neural Network (GNN) based methods have encapsulated the symmetry...

openreview.net

0

6

amelie_schreiber

@amelie_iska

16 days

And it has a BSD-3 license! Not bad.

GitHub - chiang-yuan/llamp: A web app and Python API for multi-modal RAG framework to ground LLMs...

A web app and Python API for multi-modal RAG framework to ground LLMs on high-fidelity materials informatics. An agentic materials scientist powered by @materialsproject, @langchain-ai, and @openai...

github.com

Xirtam Esrevni

@XirtamEsrevni

17 days

Pardon my language, but this is next-level shit! @jrib_ @cyrusyc_tw

1

8

94

0

2

7

amelie_schreiber

@amelie_iska

3 months

Crowdsourcing suggestion…if you could selectively disrupt or augment a pathway or PPI network, where would you start? Assume you can block any PPI, or augment the PPI network by designing proteins that create intermediary interactions (ex: proteins that bind/link two others)

3

0

6

amelie_schreiber

@amelie_iska

3 months

@alexrives I have a method for detecting AI generated proteins that I would like to open source at some point if people are interested. It seems to work on proteins generated by most models out right now, although there are a couple models it does not work for, hesitant to say which ones.

2

0

7

amelie_schreiber

@amelie_iska

3 months

Selectively modulating PPI networks by designing high affinity and high specificity binders with RFDiffusion and checking that with AF-Multimer LIS score seems like low hanging fruit to me. What reasons might there be for this not being very actively & heavily worked on?

2

0

6

amelie_schreiber

@amelie_iska

17 days

@RolandDunbrack Try Distributional Graphormer and NeuralPLexer2

2

1

6

amelie_schreiber

@amelie_iska

3 months

Computational efficiency in equivariant models is often a concern. This model addresses that and creates fast SE(n)-equivariant models for tasks involving molecules:

Fast, Expressive SE$(n)$ Equivariant Networks through...

Based on the theory of homogeneous spaces we derive geometrically optimal edge attributes to be used within the flexible message-passing framework. We formalize the notion of weight sharing in...

arxiv.org

0

1

6

amelie_schreiber

@amelie_iska

7 months

Persistent Homology Alignment (PHA): Replacing Multiple Sequence Alignments using ESM-2 and...

huggingface.co

0

2

6

amelie_schreiber

@amelie_iska

27 days

For an alternative top down approach I highly recommend the research of @drmichaellevin

1

0

6

amelie_schreiber

@amelie_iska

7 months

Eeep! It's wooorkiiing! So excited! 😊 I'll write a hf blog post on it once it's all done.

0

1

6

amelie_schreiber

@amelie_iska

3 months

@samswoora You should also check out flow matching models. Flow matching generalizes diffusion (diffusion is a special case of flow matching). They're doing a lot with proteins and flow matching, but there's less buzz about it in vision and language domains.

2

0

5

amelie_schreiber

@amelie_iska

14 days

Definitely the vibe I’m going for 😂

Eugene Vinitsky

@EugeneVinitsky

16 days

summer student project presentations are incredible

21

550

11K

0

5

amelie_schreiber

@amelie_iska

1 month

🤔

N, N-Dimethyltryptamine, a natural hallucinogen, ameliorates Alzheimer’s disease by restoring...

Background Aberrant neuronal Sigma-1 receptor (Sig-1r)-mediated endoplasmic reticulum (ER)- mitochondria signaling plays a key role in the neuronal cytopathology of Alzheimer’s disease (AD). The...

alzres.biomedcentral.com

1

0

5

amelie_schreiber

@amelie_iska

3 months

@maurice_weiler @erikverlinde @wellingmax Could someone recommend a similar resource for other architectures like equivariant transformers or equivariance in geometric GNN models? Just curious what the go to resources are for people for other architectures.

2

0

4

amelie_schreiber

@amelie_iska

7 months

Now, using persistence landscapes we can cut down clustering time from a full day to less than 30 minutes for 1000 proteins!

Faster Persistent Homology Alignment and Protein Complex Clustering with ESM-2 and Persistence...

huggingface.co

0

2

5

amelie_schreiber

@amelie_iska

6 months

@pratyusha_PS This is awesome. When will the code be available? I would love to try this with a protein language model like ESM-2 and see if it improves performance.

2

0

5

amelie_schreiber

@amelie_iska

3 months

Attempting to raise my signal to noise ratio today by making some quality posts about AI and biochemistry. 😊

0

4

amelie_schreiber

@amelie_iska

24 days

🤔 Based on this lecture (see 20:15), I think cancer is a biological “vector bundle” with degenerate or absent transition functions. In other words, local data dominates and the cohesive global structure does not exist. I wonder how far the analogy goes.

Toward AI-Driven Discovery of Electroceuticals - Dr. Michael Levin

Bioelectric networks as targets for regenerative medicine

www.youtube.com

0

5

amelie_schreiber

@amelie_iska

2 months

@310ai__ It might also be good to look into computing the LIS score based on the PAE output of RoseTTAFold All Atom similar to what was done with AlphaFold-Multimeter here. This is a new approach for protein-small molecule complexes.

Enhanced Protein-Protein Interaction Discovery via AlphaFold-Multimer

Accurately mapping protein-protein interactions (PPIs) is critical for elucidating cellular functions and has significant implications for health and disease. Conventional experimental approaches,...

www.biorxiv.org

0

1

3

amelie_schreiber

@amelie_iska

3 months

Hot take for some, obvious to others: GPUs and LLM oriented ASICs along with AI operating systems will make CPUs mostly obsolete. Anyone out there capable of writing CUDA kernels who can explain why this might be an erroneous prediction?

2

0

4

amelie_schreiber

@amelie_iska

4 months

@GabGarrett CLIP but for proteins and small molecules...

1

0

4

amelie_schreiber

@amelie_iska

3 months

@HannesStaerk @chaitjo @SimMat20 @ADuvalinho Just found this and it seems to address some of the concerns over computational cost of equivariant architectures:

Fast, Expressive SE$(n)$ Equivariant Networks through...

Based on the theory of homogeneous spaces we derive geometrically optimal edge attributes to be used within the flexible message-passing framework. We formalize the notion of weight sharing in...

arxiv.org

0

3

amelie_schreiber

@amelie_iska

6 months

This would be cool for proteins I'd love to try and use this for designing protein-protein complexes in sequence space. Too bad the code isn't released.

Meet in the Middle: A New Pre-training Paradigm

Most language models (LMs) are trained and applied in an autoregressive left-to-right fashion, assuming that the next token only depends on the preceding ones. However, this assumption ignores the...

arxiv.org

2

0

4

amelie_schreiber

@amelie_iska

17 days

Pretty neat. How does it compare to a contrastive model like ProteinDT or ProteinCLIP? And could we use it for annotating in order to train a new ProteinDT or ProteinCLIP? Is there an exceptional text-guided diffusion model coming soon for proteins?

Pete Shaw

@ptshaw2

18 days

Excited to share new work from @GoogleDeepMind : “ProtEx: A Retrieval-Augmented Approach for Protein Function Prediction”

3

41

153

1

0

3

amelie_schreiber

@amelie_iska

7 months

@andrewwhite01 You can also learn equivariance. I think equivariance is an overrated mathematical concept tbh. It's fancy and neat from a mathematical perspective, but otherwise I think you could have your network learn it and get just as far if not further.

0

4

amelie_schreiber

@amelie_iska

6 days

This is such a cute animation! 😎 Now do it for 3 modalities like ProTrek (text, protein sequence, protein structure)! 🧬

Tom Yeh | AI by Hand ✍️

@ProfTomYeh

8 days

[CLIP] by Hand ✍️ The CLIP (Contrastive Language–Image Pre-training) model, a groundbreaking work by OpenAI, redefines the intersection of computer vision and natural language processing. It is the basis of all the multi-modal foundation models we see today. How does CLIP work?

8

195

1K

0

1

5

amelie_schreiber

@amelie_iska

28 days

Anyone who understand the current state of the art in pharmacogenetic testing for determining drug efficacy and side effects, please reach out for discussion or to share some papers. I have questions, and *potentially* a few good ideas on how to improve this.

0

3

amelie_schreiber

@amelie_iska

1 month

@befcorreia @karla_mcastro @_JosephWatson @jueseph @UWproteindesign Curious to know why RFDiffusion motif scaffolding wasn’t tried here instead of or in addition to the RoseTTAFold constrained hallucination.

1

4

amelie_schreiber

@amelie_iska

7 months

Anyone have any idea why in silico directed evolution might increase perplexity and intrinsic dimension of a protein? Are more fit proteins generally more complicated?

3

0

4

amelie_schreiber

@amelie_iska

29 days

I think actually training this model could be done on Lambda Labs for around $150K (20 GPU days on 256 A100s) no? There is a difference between training and inference too that should be made clear here. Inference (using the model for predictions) is much cheaper than training.

Roland Dunbrack 🏳️‍🌈 @rolanddunbrack.bsky.social

@RolandDunbrack

29 days

Because downloadable code would let 100s of scientists put AlphaFold3 through its paces on different kinds of systems -- for benchmarking and for developing new protocols and code that has the ability to access input parameters (e.g. like colabfold does) missing from the server.

0

1

15

1

2

4

amelie_schreiber

@amelie_iska

14 days

Until everyone can code in natural language, we will not have enough “coders”.

0

4

amelie_schreiber

@amelie_iska

3 months

@HannesStaerk Still REALLY want to see this done with AlphaFold-Multimer. Maybe there’s a dynamic model of PAE and LIS that comes out of this that helps determine how strong or transient a PPI is.

Enhanced Protein-Protein Interaction Discovery via AlphaFold-Multimer

Accurately mapping protein-protein interactions (PPIs) is critical for elucidating cellular functions and has significant implications for health and disease. Conventional experimental approaches,...

www.biorxiv.org

1

2

4

amelie_schreiber

@amelie_iska

3 months

@biorxiv_bioinfo Cool idea, but how was the dataset split into train, test, and validation? Was sequence similarity/homology used to split the protein dataset? If not, this paper's results are unreliable. You have to split your data based on sequence similarity; 30% similarity is pretty standard

0

3

amelie_schreiber

@amelie_iska

17 days

@KKapusniak1 @PPotaptchik @TeoReu @leoeleoleo1 @AlexanderTong7 @mmbronstein @bose_joey @Francesco_dgv @FrankNoeBerlin this maybe useful for transition pathways between conformations.

0

4

amelie_schreiber

@amelie_iska

3 months

AlphaFlow-Multimer with the appropriate generalization of the LIS score would more or less solve PPI prediction. LIS alone already mostly solves it. Then the only bottleneck for giant detailed PPI networks is compute. This is a big deal. Explain to me why I might be wrong.

0

4

amelie_schreiber

@amelie_iska

27 days

I love this. Thank you! Gotta go watch now! p(shittakes | Michael Levin) << p(shittakes | Amelie Schreiber) 😂 Also, @drmichaellevin …feel free to DM anytime with project ideas 🤓

Nathan Labenz

@labenz

27 days

@amelie_iska @drmichaellevin Another of my favorite episodes!

0

3

1

0

4

amelie_schreiber

@amelie_iska

3 months

Toward AI-Driven Discovery of Electroceuticals - Dr. Michael Levin

Bioelectric networks as targets for regenerative medicine

www.youtube.com

0

3

amelie_schreiber

@amelie_iska

1 month

@iScienceLuvr SVD initialization would’ve helped a lot.

Daniel Han

@danielhanchen

1 month

My take on "LoRA Learns Less and Forgets Less" 1) "MLP/All" did not include gate_proj. QKVO, up & down trained but not gate (pg 3 footnote) 2) Why does LoRA perform well on math and not code? lm_head & embed_tokens wasn't trained, so domain shifts not modelled. Also reason why

7

123

574

0

3

amelie_schreiber

@amelie_iska

3 months

It would be very interesting and useful to see how this could be used in tandem with the following method for detecting binding sites of conformational ensembles of proteins using ESM-IF1:

GitHub - oxpig/binding-sites

Contribute to oxpig/binding-sites development by creating an account on GitHub.

github.com

0

3

amelie_schreiber

@amelie_iska

2 months

@ZymoSuperMan RFDiffusion works with structures, not sequences. For designing sequences that fold into the backbones that RFDiffusion generates you’ll need something like LigandMPNN which does allow for things like biasing particular residues in various ways to constrain the sequences designed

1

0

3

amelie_schreiber

@amelie_iska

16 days

After that, team up with @BakerLaboratory and make the best “RFDiffusion” and “LigandMPNN” anyone’s ever seen, but this time use continuous and discrete flow matching resp. and make it for all the biomolecules. 4 essential “foundation models” and we’ll be all set +/-ε 🎉😎🧬

0

3

amelie_schreiber

@amelie_iska

1 month

@lpachter For academic uses that don’t compete with Isomorphic Labs’ research… that part is subtle, but important. It means if you want to develop a new drug and have any hope of taking it to market, or if you’re not in academia, you’re out of luck. And no reproducing because patents! 😒

0

3

amelie_schreiber

@amelie_iska

2 months

Really cool channel. Maybe we’ll get a video on SE(3)-equivariant neural networks one day🤞This would be great for folks trying to understand new SOTA models for proteins and small molecules. I would totally be down to collaborate @mathemaniacyt 🧬

Mathemaniac

@mathemaniacyt

2 months

Why do we require Jacobi identity to be satisfied for a Lie bracket? In the process, we also understand intuitively why tr(AB) = tr(BA) without matrix components. Watch now:

2

103

611

0

3