Cedric C.S. Tan
@cedriccstan
Followers
254
Following
172
Media
32
Statuses
190
PhD student at @TheCrick and @UGI_at_UCL. Computational biologist working broadly on pathogen (meta)genomics.
Joined September 2021
Pleased to share that this project from my PhD is now published in @NatureEcoEvo! Many thanks to my amazing supervisors @BallouxFrancois and @LucyvanDorp - I could not have done it without them. Link to paper -> https://t.co/bdomUdp8YP (1)
New preprint ( https://t.co/Yzh5zdQSrk) out! We explore the following: 1. What is the current state of viral genomic surveillance? 2. Do we give more viruses to animals than they give to us? 3. Why are multi-host viruses more at risk of jumping into humans? Summary🧵 below (1)
2
17
41
New preprint analysing microbial signatures of Hospital-Acquired Pneumonia (HAP). HAP is a major cause of morbidity / mortality, yet it remains poorly defined microbiologically. We profiled the respiratory microbiomes of 250 HAP patients in the UK. Metagenomic sequencing detects
2
16
53
Glad to be involved in this work on the reductive evolution of Borrelia, now out in @ScienceMagazine. Led by @Pooja_Swali and @LucyvanDorp. Congrats to all co-authors! https://t.co/yZ6vZi95Ps
science.org
Several bacterial pathogens have transitioned from tick-borne to louse-borne transmission, which often involves genome reduction and increasing virulence. However, the timing of such transitions...
0
3
16
This random BAM file on the SRA was mysteriously breaking my pipeline. Why? It was full of someone's ls -l output 🫠
15
22
253
I am excited to share our work "Systematic discovery of antibacterial and antifungal bacterial toxins" that is published in @NatureMicrobiol
https://t.co/bSqF1RUD99
15
90
355
Many thanks and congratulations to my co-authors: Marina Escalera-Zamudio, Alexei Yavlinsky, @LucyvanDorp and @BallouxFrancois! 19/
1
0
0
Nevertheless, our results highlight the value of using intrahost dynamics to predict mutation success, which we think can be easily ported to other pathogen systems. 18/
1
0
0
Our current models are far from perfect, so maybe we could include other evolutionary, immunological and epidemiological predictors of mutation fitness. 17/
1
0
1
Overall, we show that the intrahost diversity of viral infections, when combined with other genetic and phenotypic effects, could be used to predict the future fitness of mutations. 16/
1
0
0
And indeed, adding genetic linkage into our models improved their predictions, especially for the fitter mutations. The SHAP analyses confirm that our linkage predictors are the reason for this improvement. 15/
1
0
0
We thought that perhaps the missing link is genetic linkage, or the co-occurrence of mutations, which in some cases may boost their fitness (i.e., epistasis). 14/
1
0
0
However, we noticed that the prediction errors for our models tended to be higher for the fitter mutations, suggesting that perhaps our models are still missing something... 13/
1
0
0
The patterns of other physiochemical and phenotypic predictors also reflect important evolutionary concepts, but I won't go into detail here. Importantly, the model interpretation analyses suggest that our models were picking up on biologically relevant patterns. 12/
1
0
0
Based on the SHAP analysis, our models predict higher fitness values for mutations with higher max. intrahost frequencies. In other words, mutations with a high intrahost frequency are also more likely to be fitter in the future. 11/
1
0
0
To understand what our models were actually learning, we employed the SHAP model explanation framework developed by Lundberg et al. Maximum intrahost frequency was consistently the most important feature. 10/
1
0
0
Our models performed pretty well for each timeframe (r2=0.53-0.68), even when we trained models on one timeframe and tested on another (r2=0.52), suggesting that the data patterns learnt by our models are highly generalisable. 9/
1
0
0
The predictors of mutation fitness used: 1. Intrahost: as derived from intrahost frequencies. 2. Physiochemical: charge, mol. weight, hydrophobicity, BLOSUM62 score. 3. RBD phenotypes: binding, expression, antibody escape (by @jbloom_lab). 8/
1
0
0
To formally test whether intrahost freq. patterns were a good predictor of success, we trained we trained separate XGBoost regression models to forecast the fitness of intrahost mutations measured after each sampling timeframe (i.e., future fitness), one for each timeframe. 7/
1
0
1
We found that the intrahost mutations that eventually become highly successful only reach peak frequency in GISAID genomes at a median of 6-40 months after the timeframes of our datasets. 6/
1
0
0
To test this, we curated and analysed the intrahost mutation frequencies of ~8000 SARS-CoV-2 sequencing libraries. These libraries represent random samples of SARS-CoV-2 infections collected across seven distinct sampling timeframes in the pandemic. 5/
1
0
0