cedriccstan Profile Banner
Cedric C.S. Tan Profile
Cedric C.S. Tan

@cedriccstan

Followers
254
Following
172
Media
32
Statuses
190

PhD student at @TheCrick and @UGI_at_UCL. Computational biologist working broadly on pathogen (meta)genomics.

Joined September 2021
Don't wanna be here? Send us removal request.
@cedriccstan
Cedric C.S. Tan
2 years
Pleased to share that this project from my PhD is now published in @NatureEcoEvo! Many thanks to my amazing supervisors @BallouxFrancois and @LucyvanDorp - I could not have done it without them. Link to paper -> https://t.co/bdomUdp8YP (1)
@cedriccstan
Cedric C.S. Tan
2 years
New preprint ( https://t.co/Yzh5zdQSrk) out! We explore the following: 1. What is the current state of viral genomic surveillance? 2. Do we give more viruses to animals than they give to us? 3. Why are multi-host viruses more at risk of jumping into humans? Summary🧵 below (1)
2
17
41
@BallouxFrancois
Prof Francois Balloux
3 months
New preprint analysing microbial signatures of Hospital-Acquired Pneumonia (HAP). HAP is a major cause of morbidity / mortality, yet it remains poorly defined microbiologically. We profiled the respiratory microbiomes of 250 HAP patients in the UK. Metagenomic sequencing detects
2
16
53
@vsbuffalo
Vince Buffalo
1 year
This random BAM file on the SRA was mysteriously breaking my pipeline. Why? It was full of someone's ls -l output 🫠
15
22
253
@AsafLevyHUJI
Asaf Levy אסף לוי اسف ليڤي
1 year
I am excited to share our work "Systematic discovery of antibacterial and antifungal bacterial toxins" that is published in @NatureMicrobiol https://t.co/bSqF1RUD99
15
90
355
@cedriccstan
Cedric C.S. Tan
1 year
Link to preprint: https://t.co/LgHjp1Stuk 20/
0
0
0
@cedriccstan
Cedric C.S. Tan
1 year
Many thanks and congratulations to my co-authors: Marina Escalera-Zamudio, Alexei Yavlinsky, @LucyvanDorp and @BallouxFrancois! 19/
1
0
0
@cedriccstan
Cedric C.S. Tan
1 year
Nevertheless, our results highlight the value of using intrahost dynamics to predict mutation success, which we think can be easily ported to other pathogen systems. 18/
1
0
0
@cedriccstan
Cedric C.S. Tan
1 year
Our current models are far from perfect, so maybe we could include other evolutionary, immunological and epidemiological predictors of mutation fitness. 17/
1
0
1
@cedriccstan
Cedric C.S. Tan
1 year
Overall, we show that the intrahost diversity of viral infections, when combined with other genetic and phenotypic effects, could be used to predict the future fitness of mutations. 16/
1
0
0
@cedriccstan
Cedric C.S. Tan
1 year
And indeed, adding genetic linkage into our models improved their predictions, especially for the fitter mutations. The SHAP analyses confirm that our linkage predictors are the reason for this improvement. 15/
1
0
0
@cedriccstan
Cedric C.S. Tan
1 year
We thought that perhaps the missing link is genetic linkage, or the co-occurrence of mutations, which in some cases may boost their fitness (i.e., epistasis). 14/
1
0
0
@cedriccstan
Cedric C.S. Tan
1 year
However, we noticed that the prediction errors for our models tended to be higher for the fitter mutations, suggesting that perhaps our models are still missing something... 13/
1
0
0
@cedriccstan
Cedric C.S. Tan
1 year
The patterns of other physiochemical and phenotypic predictors also reflect important evolutionary concepts, but I won't go into detail here. Importantly, the model interpretation analyses suggest that our models were picking up on biologically relevant patterns. 12/
1
0
0
@cedriccstan
Cedric C.S. Tan
1 year
Based on the SHAP analysis, our models predict higher fitness values for mutations with higher max. intrahost frequencies. In other words, mutations with a high intrahost frequency are also more likely to be fitter in the future. 11/
1
0
0
@cedriccstan
Cedric C.S. Tan
1 year
To understand what our models were actually learning, we employed the SHAP model explanation framework developed by Lundberg et al. Maximum intrahost frequency was consistently the most important feature. 10/
1
0
0
@cedriccstan
Cedric C.S. Tan
1 year
Our models performed pretty well for each timeframe (r2=0.53-0.68), even when we trained models on one timeframe and tested on another (r2=0.52), suggesting that the data patterns learnt by our models are highly generalisable. 9/
1
0
0
@cedriccstan
Cedric C.S. Tan
1 year
The predictors of mutation fitness used: 1. Intrahost: as derived from intrahost frequencies. 2. Physiochemical: charge, mol. weight, hydrophobicity, BLOSUM62 score. 3. RBD phenotypes: binding, expression, antibody escape (by @jbloom_lab). 8/
1
0
0
@cedriccstan
Cedric C.S. Tan
1 year
To formally test whether intrahost freq. patterns were a good predictor of success, we trained we trained separate XGBoost regression models to forecast the fitness of intrahost mutations measured after each sampling timeframe (i.e., future fitness), one for each timeframe. 7/
1
0
1
@cedriccstan
Cedric C.S. Tan
1 year
We found that the intrahost mutations that eventually become highly successful only reach peak frequency in GISAID genomes at a median of 6-40 months after the timeframes of our datasets. 6/
1
0
0
@cedriccstan
Cedric C.S. Tan
1 year
To test this, we curated and analysed the intrahost mutation frequencies of ~8000 SARS-CoV-2 sequencing libraries. These libraries represent random samples of SARS-CoV-2 infections collected across seven distinct sampling timeframes in the pandemic. 5/
1
0
0