cedriccstan Profile Banner
Cedric C.S. Tan Profile
Cedric C.S. Tan

@cedriccstan

Followers
257
Following
170
Media
32
Statuses
189

PhD student at @TheCrick and @UGI_at_UCL. Computational biologist working broadly on pathogen (meta)genomics.

Joined September 2021
Don't wanna be here? Send us removal request.
@cedriccstan
Cedric C.S. Tan
1 year
Pleased to share that this project from my PhD is now published in @NatureEcoEvo! Many thanks to my amazing supervisors @BallouxFrancois and @LucyvanDorp - I could not have done it without them. Link to paper -> (1).
@cedriccstan
Cedric C.S. Tan
2 years
New preprint ( out! We explore the following:.1. What is the current state of viral genomic surveillance?. 2. Do we give more viruses to animals than they give to us?. 3. Why are multi-host viruses more at risk of jumping into humans?. Summaryđź§µ below (1).
2
19
42
@cedriccstan
Cedric C.S. Tan
1 month
Glad to be involved in this work on the reductive evolution of Borrelia, now out in @ScienceMagazine. Led by @Pooja_Swali and @LucyvanDorp. Congrats to all co-authors! .
0
3
16
@cedriccstan
Cedric C.S. Tan
8 months
RT @vsbuffalo: This random BAM file on the SRA was mysteriously breaking my pipeline. Why? It was full of someone's ls -l output 🫠 https://….
0
22
0
@cedriccstan
Cedric C.S. Tan
9 months
RT @AsafLevyHUJI: I am excited to share our work "Systematic discovery of antibacterial and antifungal bacterial toxins" that is published….
0
91
0
@cedriccstan
Cedric C.S. Tan
9 months
Link to preprint: 20/.
0
0
0
@cedriccstan
Cedric C.S. Tan
9 months
Many thanks and congratulations to my co-authors: Marina Escalera-Zamudio, Alexei Yavlinsky, @LucyvanDorp and @BallouxFrancois!. 19/.
1
0
0
@cedriccstan
Cedric C.S. Tan
9 months
Nevertheless, our results highlight the value of using intrahost dynamics to predict mutation success, which we think can be easily ported to other pathogen systems. 18/.
1
0
0
@cedriccstan
Cedric C.S. Tan
9 months
Our current models are far from perfect, so maybe we could include other evolutionary, immunological and epidemiological predictors of mutation fitness. 17/.
1
0
1
@cedriccstan
Cedric C.S. Tan
9 months
Overall, we show that the intrahost diversity of viral infections, when combined with other genetic and phenotypic effects, could be used to predict the future fitness of mutations. 16/.
1
0
0
@cedriccstan
Cedric C.S. Tan
9 months
And indeed, adding genetic linkage into our models improved their predictions, especially for the fitter mutations. The SHAP analyses confirm that our linkage predictors are the reason for this improvement. 15/
Tweet media one
1
0
0
@cedriccstan
Cedric C.S. Tan
9 months
We thought that perhaps the missing link is genetic linkage, or the co-occurrence of mutations, which in some cases may boost their fitness (i.e., epistasis). 14/.
1
0
0
@cedriccstan
Cedric C.S. Tan
9 months
However, we noticed that the prediction errors for our models tended to be higher for the fitter mutations, suggesting that perhaps our models are still missing something. 13/
Tweet media one
1
0
0
@cedriccstan
Cedric C.S. Tan
9 months
The patterns of other physiochemical and phenotypic predictors also reflect important evolutionary concepts, but I won't go into detail here. Importantly, the model interpretation analyses suggest that our models were picking up on biologically relevant patterns. 12/
Tweet media one
1
0
0
@cedriccstan
Cedric C.S. Tan
9 months
Based on the SHAP analysis, our models predict higher fitness values for mutations with higher max. intrahost frequencies. In other words, mutations with a high intrahost frequency are also more likely to be fitter in the future. 11/
Tweet media one
1
0
0
@cedriccstan
Cedric C.S. Tan
9 months
To understand what our models were actually learning, we employed the SHAP model explanation framework developed by Lundberg et al. Maximum intrahost frequency was consistently the most important feature. 10/
Tweet media one
1
0
0
@cedriccstan
Cedric C.S. Tan
9 months
Our models performed pretty well for each timeframe (r2=0.53-0.68), even when we trained models on one timeframe and tested on another (r2=0.52), suggesting that the data patterns learnt by our models are highly generalisable. 9/.
1
0
0
@cedriccstan
Cedric C.S. Tan
9 months
The predictors of mutation fitness used:.1. Intrahost: as derived from intrahost frequencies. 2. Physiochemical: charge, mol. weight, hydrophobicity, BLOSUM62 score. 3. RBD phenotypes: binding, expression, antibody escape (by @jbloom_lab). 8/.
1
0
0
@cedriccstan
Cedric C.S. Tan
9 months
To formally test whether intrahost freq. patterns were a good predictor of success, we trained we trained separate XGBoost regression models to forecast the fitness of intrahost mutations measured after each sampling timeframe (i.e., future fitness), one for each timeframe. 7/.
1
0
1
@cedriccstan
Cedric C.S. Tan
9 months
We found that the intrahost mutations that eventually become highly successful only reach peak frequency in GISAID genomes at a median of 6-40 months after the timeframes of our datasets. 6/
Tweet media one
1
0
0
@cedriccstan
Cedric C.S. Tan
9 months
To test this, we curated and analysed the intrahost mutation frequencies of ~8000 SARS-CoV-2 sequencing libraries. These libraries represent random samples of SARS-CoV-2 infections collected across seven distinct sampling timeframes in the pandemic. 5/
Tweet media one
1
0
0
@cedriccstan
Cedric C.S. Tan
9 months
Inspired by this, we hypothesised that fitter mutations would be observed more often and at higher frequencies within infections even before they are observed in the consensus genomes submitted to public databases. 4/.
1
0
0