We have an open postdoc position in the area of protein structural bioinformatics. Areas of interest include proteome-wide models, variant effect prediction, molecular evolution, allosteric regulation, protein/drug design. Deadline 20th July
Our community effort to study AlphaFold2 applications is now published. Beyond the science what I want to highlight was that this was possible because scientists were sharing early results on twitter and this was the result of putting that together.
AlphaFold3 is out with improvements on structural models that include DNA/RNA and small molecules. Unfortunately, there is no code, no binary to run at scale and only a limited webserver. Why even publish?
After 9 years of
@embl
, I am thrilled to say that next Jan I will join the faculty of
@ETH_DBIOL
at ETH Zurich with research at
@IMSB_ETH
. I am really excited about the move, my colleagues at IMSB and more broadly the amazing local research environment
We joined a large community effort to assess diverse applications of AlphaFold 2 in the context of novel structural elements; missense variants; function and ligand binding sites; modelling of interactions and experimental structural data. Some highlights below:
Our collaboration with
@arneelof
's lab exploring the use of AlphaFold2 for large scale prediction of structures of human complexes is now published
@davidfburke
@Patrick18287926
et al.
How many different protein shapes are there in 200 million predicted AlphaFold structures? In a joint project with
@thesteinegger
we (
@ibarrioh
,
@jgyyy15
et al) clustered these into around 2 million groups to study their novelty and evolution
Jointly with
@arneelof
's group we applied Alphafold to predict complex structures for 65,000 human protein-protein interactions. Here is what we learned in trying to generate a structurally resolved human interaction network
Less than 5% of human phosphosites have a known function. Also published today -
@d0choa
et al. have set out to create a high quality catalog of human phosphosites and an integrated score than can rank phosphosites critical for the cell
@NatureBiotech
Yesterday's
@emblebi
training webinar on "How to interpret AlphaFold structures" is now online. It is oriented towards users with practical explanations about the method, the confidence scores, the database and several use case applications
Can anyone tell the EU that Marie Curie fellowship applications are ridiculously full of crap? Talk about taking the joy out of science for early career scientists
I cleared my office at
@emblebi
and I am moving out of the UK next week.
@embl
is a one of a kind place to start a research group and I am privileged to have had this opportunity. I owe one final entry to the EMBL chapter of this blog series:
How does SARS-CoV-2 take over kinase signalling to subvert normal cell function and what drugs can we use to counter these changes? Our new work with
@KroganLab
and a large international group of scientists Spearhead by
@DoctorBou
@danishm20
and others
Now in peer-reviewed form - our study on network propagation of GWAS linked genes across >1000 human traits identifies aspects of cell biology that impact on multiple traits
@ibarrioh
et al. an
@OpenTargets
project
I am introducing the hype cycle to students as a tool to manage expectations for hyped up technologies and concepts. Here is a suggestion for Bio 2018.
Phosphorylation based signalling is still quite understudied despite a general perception that it is well understood. A great study here measuring specificity for human serine/threonine kinases
My typical working pattern: unproductive relaxed period -> accept unrealistic amounts of commitments -> stress with the pile of work and deadlines -> highly productive and stressful work period -> reject any work commitment I can while criticizing my past self -> repeat
New lab preprint - Deep mutational scanning can tell us about the function of individual protein positions.
@Ally_Dunham
normalised DMS data for 6291 positions in 30 proteins to ask how many amino acid functions are there and how frequently are they used
Marta Strumillo's main PhD work now online - she used 530,000 phosphosites from 40 eukaryotic species to search for "ultra-conserved" phosphorylation within globular protein domains. Such hotspots of phosphorylation are enriched for functional regions
New AI methods have led to the prediction of structures for hundreds of millions of proteins but these have to be explored. Together with
@thesteinegger
we clustered the 220 million structures in the
@emblebi
Alphafold DB to study their evolution and degree of novelty. See 🧵
We clustered the
#AlphaFold
structure database with our novel Foldseek algorithm. We identified 2.27M clusters and analyzed them by function, annotation, domains and evolution. Amazing collaboration with
@pedrobeltrao
lab. 1/n
📄
💾
new lab preprint - using proteomics and protein co-variation to infer protein association networks across 11 human tissues to study what determines tissue differences and to prioritize disease genes in GWAS linked genes through tissue specific networks
Starting a project in single cell genomics these days feels like trying to jump onto a high speed train as it passes by you. This community could take a 2 year break and let the rest of us assimilate some of it.
Now in published form
@Cris_Vieitez
Bede Busby et al. on an experimental method to functionally annotate protein phosphorylation, using the power of yeast genetics (with
@TypasLab
,
@savitski_lab
and others at
@embl
)
EMBL will host a networking event for The Next Generation in Infection Biology. Apply if you are looking for PI positions in the area. >30 leading organisations will take part and registered their interest in you
(26-27 Jan 2022)
#EMBLInfectionBiology
Our institute
@emblebi
is looking to hire a group leader in computational biology (broad search). This comes with very generous core funding, access to HPC/GPU compute and the outstanding collaborative potential of EMBL, Sanger and Cambridge research
50 years ago the carnation revolution in Portugal led to the end of the longest running dictatorship in Europe. May this anniversary remind us that these freedoms we enjoy cannot be taken for granted
#25deabril50anos
Our (
@jurgjn
) review on recent progress in deep learning models for protein structure prediction and design is out in
@MolSystBiol
. This is meant as a primer for those outside wanting to get an update on what is going on with some thoughts on the future
Submitted the 10th preprint of our group to
@biorxivpreprint
today and it is fantastic that this is a common practice in a growing number of fields.
Years of research remain hidden from view without them. Don't let others live in your past. Preprint
Large curation effort to compile 28,000 annotations describing the effects of genetic variation on physical protein interactions. A great resource for method development and annotation of natural/disease variants.
One final alphafold thing :) -
@Ally_Dunham
compared deep mutational scanning data with predicted changes in stability for mutations on the alphafold models (with FoldX). The correlations observed are typically as good or better than with experimentally derived structures
Our group is now formally affiliated with the Swiss Institute of Bioinformatics
@ISBSIB
. I look forward to contributing to the continued development of bioinformatics in Switzerland and to get to know our colleagues in joint meetings
Exciting day for (structural) biology! The peer reviewed publications for AlphaFold 2 and RoseTTAFold are out and both with code available. Here is the AlphaFold publication describing the amazing progress made by DeepMind in protein structure predictions
The last PhD project from
@Ally_Dunham
was a collaboration with
@MoAlQuraishi
on a convolution neural network model for protein variant effect prediction. It achieves fast effect prediction without alignments.
I am very thankful for the Helmut Horten Foundation, that has supported my professorship appointment at
@ETH
. Through this I join
@theLoopZurich
, a medical centre for translational research and precision medicine
How I wish society and our politicians would realise the investment opportunity of increasing the ERC budget. A 10% funding rate leaves so much fantastic science unfunded and it makes the selection process less fair.
EMBL and
@DeepMind
have partnered – a breakthrough for science.
Together, we're providing a treasure trove of protein structure predictions powered by
#AlphaFold
to herald a new era for
#AI
-enabled biology.
our latest preprint:
@omarwagih
"Comprehensive variant effect predictions of single nucleotide variants in model organisms"
Describing the resource for H. sapiens, S. cerevisiae and E. coli
Underneath the joy of seeing humanity make progress in solving a fundamental problem in biology I have some mixed feelings that a company leapfrogged over the academic efforts in 2 years. It does not say much about the academic model on its own but it is another data point.
Now published in
@PLOSBiology
-
@dbradley534
's study of the evolution of kinase-substrate active site recognition. When during evolution did kinases learn to recognise different types of motifs ? (quick summary and future directions)
More cancer cell line screening data from
@CancerDepMap
at the
@broadinstitute
4,518 drugs tested across 578 (pooled) human cancer cell lines in a single dose and 1,448 positives re-screened in 8-point dose response. Many new anti-cancer activities.
Very detailed analysis on using AlphaFold to predict interactions between domains and linear motifs from Katja Luck's lab, including some ideas on doing fragment screening.
A tiny contribution of our lab to this amazing effort to identify SARS CoV2 human protein interactions with a goal of finding new drugs. Achieved at an unprecedented speed for such a study.
Our paper is out on
@biorxivpreprint
describing our SARS CoV2-human protein-protein interaction map and drug predictions from the data. It was a honor to work with so many fantastic scientists around the world. -Nevan
Proteins that interact together are often relevant for the same traits and network analyses of GWAS genes can help identify trait relevant cell biology. Here
@ibarrioh
applied this to study 1002 traits defining a pleiotropy map of human cell biology
Cancer is more than its DNA. Now in published form, our study of predicted kinase/TF activities from multi-omics data, with a striking result being a lack of expected associations between mutations and activities
@abelfsousa
with
@DugourdAjf
@saezlab
@e_petsalaki
@danishm20
Watching Janet's Thornton's farewell EMBL seminar - 50 years of enzymes. Besides being such a exceptional scientist Janet is also an inspiring example of leadership. Follow the link to know more about her contributions to science:
New lab preprint - Marta Strumillo's main PhD project where she used 500,000 phosphosites from 40 eukaryotic species to find regions with conserved phosphorylation within protein domain families. Thread with key findings:
How often does a phenotype of a gene loss-of-function (LoF) depend on the genetic background ?
Now in published form - work from
@mgalactus
Bede Busby et al. studies the changes in gene deletion phenotypes across strains of S. cerevisiae
Companies like Deepmind do not have any requirements to make their research available. I signed the letter below because they chose to publish their work and different standards were applied to the Alphafold 3 publication as those applied to others.
We were incredibly disappointed with the lack of code or executable accompanying the publication of AF3 in
@Nature
. This contradicts scientific principles of the ability to evaluate, use, and build upon existing work.
An "all-by-all" reference map of protein interactions using so called binary approaches (not pull downs). Some examples of how to use the reference with a focus on tissue specific protein interactions.
I managed to recapitulate a couple of phosphorylation dependent conformation switches with Alphafold3. Totally meaningless without proper benchmarking with larger numbers and control of what is being used as templates.
AlphaFold predicted structures for the full protein universe (200 million). Quite a lot of redundancy in the set but so many discoveries can be powered by this. Novel enzymes, evolution of life, possibly protein dynamics from ensembles. The hart part is how to even analyse it all
Today in partnership with
@emblebi
, we’re releasing predicted structures for nearly all catalogued proteins known to science, which will expand the
#AlphaFold
database by over 200x - from nearly 1 million to 200+ million structures: 1/
Here is a simple test of AlphaFold 3 for a multi-subunit complex that was not in the training data and low homology to anything in the training data. The prediction is highly accurate but there is no way of knowing what templates are used during model prediction in the server.
There is a group leader opening (equivalent to full prof) in the Biomedical institute of University of Aveiro, Portugal PDF of call: Beautiful small town with a young and vibrant university. If anyone is curious I can tell you more
I am pretty excited that our team with
@CorreiaMeloC
and
@KasperFugger
is among the selected to receive an HFSP research grant. We will be studying bioactive modified nucleotides and their modifying enzymes. It will also be great to be back at HFSP meetings.
DeepMind builds a variant effect predictor - using multiple sequence alignment and training on allele frequency. A different approach but not a big jump in accuracy.
I had great day today at
@IGCiencia
where I will be periodically spending some time establishing collaborations along shared interests and helping to promote the application of computational approaches in life science research.
So happy to host the 1st
#GulbenkianSeniorFellow
@pedrobeltrao
. Pedro will spend regularly time at
@IGCiencia
to promote excellent science using computational biology approaches within the IGC and in Portugal aligned with IGC’s vision and mission!
In another effort by the QBI Coronavirus Research Group, we and many others have studied the commonalities of host targeted mechanisms of SARS-CoV-2, SARS-CoV-1 and MERS-CoV, identifying drug targets with potential pan-coronavirus activity
#QCRGglobal
We have a postdoc position open at
@IMSB_ETH
for someone with experience in phosphoproteomics in areas related with the function, dynamics, evolution and misregulation of PTMs (apply by March 20th, flexible starting date)
#TeamMassSpec
A warm welcome to our new Advisory Editorial Board members, we are looking forward to working with you! 🎉
And... a good occasion to thank our entire AEB for their support!
#SysBio
#SystemsBiology
Building phylogenetic trees guided by predicted protein structures, made possible by describing proteins with the Foldseek structural alphabet. I think this will also have an impact on ancestral sequence/structure reconstruction.
In a post
#AlphaFold
world, we can use protein structures in ways we never could before. Can we build phylogenies with them? Are they any good? Yes! Foldtree () surpasses traditional sequence-based methods, even for closely related proteins.👇
Thousands of phosphosites have been discovered with <5% having a known function. In our new preprint
@d0choa
uses a machine learning approach to address this gap
If you care what phosphosites may regulate your protein/process of interest read on
The
@emblebi
Alphafold database was just expanded to double the size with new predictions for most of the manually curated
@uniprot
entries. It will continue to expand over next year to cover millions of protein sequences.
🎉New
#AlphaFold
data! With
@DeepMind
, we’ve more than doubled the size of the database & added predictions for most of the manually-curated
@uniprot
entries in UniProtKB/SwissProt.
That's >400,000 new protein structure predictions for you to explore!
New blog post: State of the lab 9 - an informal report on the 9 years of EMBL-EBI. Part of a yearly series on running a lab in academia. This one looks back at the 9 years of EMBL, mostly with numbers, including the finances.
My group at
@IMSB_ETH
ETHZ is recruiting a lab technician/manager with a PhD in S. cerevisiae genetics / cell bio. ~60% time devoted to research on the functional relevance of PTMs and genetics of trait variation and ~40% on lab management
Re-watched contagion and remembered how ridiculous I had found the pace of research on the movie. Impressive how during this pandemic the science has actually moved at sci-fi movie speed.
Getting closer to nanopore protein sequencing. I wonder what the scale of this could be in theory and now it would compete with mass spectrometry if it does work
The last PhD project from
@Ally_Dunham
is now out in published form. A collaboration with
@MoAlQuraishi
where Ally developed a very fast protein missense variant effect predictor using a convolutional neural network model.
Large scale genetic interactions in human with double CRISPRi (472×472 genes x2 cell lines). Besides basic biology findings this has useful applications for design of drug targets in cancer (second hit for LoF)
Another large cancer proteomics dataset. If you are not counting this is now over 1000 cancer samples with protein levels and nearly 1000 samples with phosphoproteomic data in CPTAC
Large cancer studies have been sequencing based despite it being a signalling disease. Here
@abelfsousa
compiled matched genomic and (phospho)proteomic data for 1,110 tumours and estimated activity changes in ~500 kinases/TFs based on their targets.
UnderwheLLM: "Reviewer 2" AI that hates your work!
I have shared the idea before, but now I made HuggingChat Assistant that will write a negative peer review report based on any abstract you submit to it.
In February of last year Roman Cheplyaka (
@bioinfochat
) recorded a chat with me,
@janani_hex
and
@_amelie_rocks
on AlphaFold 2 but then his country was invaded. Somehow he managed to find time (!) to edit the first episode which you can find here
We are having our group retreat in Bergamo, celebrating almost 11 years of the life of the group. We are now through the second major turnover and looking ahead to 4-5 years with many newly joined members with projects in their growing phase. Exciting times ahead.
On behalf of
@embl
and Portugal's Presidency of the European Council I am excited to invite you to the upcoming meeting on Emerging Infections Diseases, focused on the current SARS-CoV-2 pandemic (free event, 27th of April)
#EU2021PT
New preprint
@Cris_Vieitez
Bede Busby et al. on our biggest effort to date to functionally annotate protein phosphorylation, using the awesome power of yeast genetics (with the Typas and
@savitski_lab
and others at
@embl
)
Now in published form -
@Ally_Dunham
combined experimental outcomes of mutations for 6291 positions in 30 proteins, showing these can group into different functional groups. Ally then studied these in the context of protein structures and evolution.
Together with the lab of
@mjamorim1
at
@IGCiencia
we are hiring 2 postdoctoral researchers in Lisbon - bioinformatics (structural, networks, omics) to study viral proteins/networks and to help develop new therapeutics. Application instructions (PDF !)
We are playing with the alphafold models like many others. Here are some generic issues using as example the EGFR based on work by
@jurgjn
that is evaluating the models for pocket detection and docking.
Cancer mutations have been suggested to have the potential to change physical interactions. This work shows clear evidence of how this can cause changes in signalling networks
Now in published form -
@danishm20
identified reproducible associations between gene copy number variation (CNVs) and phosphorylation changes across tumours. (jointly with
@MillerLab4
)
Registrations open:
@EMBO
Seefeld signalling meeting (11-16 Sep). Bringing together people with different mindsets (structures to organism, systems to engineering). It has a great list of invited speakers and many talks will be selected from abstracts
The increased focus in exome sequencing and rare coding variant with higher effect sizes is going to shift human genetics into protein and cell biology (vs. gene expression regulation). I find that exciting as it is easier to follow up and more likely to result in therapies
A proteomics approach to identify interactions regulated by phosphorylation. Part of the growing arsenal of tools to annotate the functional importance of protein phosphorylation
Proteomic study of 202 human iPSC lines with matched mRNA and genome information:
- eQTLs can result in trans protein QTLs via protein interactions
- pQTLs are more enriched in disease-linked GWAS variants than eQTLs
@OliverStegle
& Lamond labs
Finally, this is very much a
@Twitter
paper. It was born from the early access of results here and initially coordinate here before we moved to google docs and email.
It was certainly an interesting experience having to be back in the job market again as my 9 years at EMBL were getting closer to the end. I wrote a blog post about the move, the interview process and what is on my mind about a new start.