Cedric C.S. Tan @cedriccstan X Profile

Cedric C.S. Tan

@cedriccstan

Followers

254

Following

172

Media

32

Statuses

190

PhD student at @TheCrick and @UGI_at_UCL. Computational biologist working broadly on pathogen (meta)genomics.

https://t.co/QZrLzYSJfa

Joined September 2021

Don't wanna be here? Send us removal request.

Cedric C.S. Tan

@cedriccstan

2 years

Pleased to share that this project from my PhD is now published in @NatureEcoEvo! Many thanks to my amazing supervisors @BallouxFrancois and @LucyvanDorp - I could not have done it without them. Link to paper -> https://t.co/bdomUdp8YP (1)

Cedric C.S. Tan

@cedriccstan

2 years

New preprint ( https://t.co/Yzh5zdQSrk) out! We explore the following: 1. What is the current state of viral genomic surveillance? 2. Do we give more viruses to animals than they give to us? 3. Why are multi-host viruses more at risk of jumping into humans? Summary🧵 below (1)

2

17

41

Prof Francois Balloux

@BallouxFrancois

3 months

New preprint analysing microbial signatures of Hospital-Acquired Pneumonia (HAP). HAP is a major cause of morbidity / mortality, yet it remains poorly defined microbiologically. We profiled the respiratory microbiomes of 250 HAP patients in the UK. Metagenomic sequencing detects

2

16

53

Cedric C.S. Tan

@cedriccstan

6 months

Glad to be involved in this work on the reductive evolution of Borrelia, now out in @ScienceMagazine. Led by @Pooja_Swali and @LucyvanDorp. Congrats to all co-authors! https://t.co/yZ6vZi95Ps

science.org

Several bacterial pathogens have transitioned from tick-borne to louse-borne transmission, which often involves genome reduction and increasing virulence. However, the timing of such transitions...

0

3

16

Vince Buffalo

@vsbuffalo

1 year

This random BAM file on the SRA was mysteriously breaking my pipeline. Why? It was full of someone's ls -l output 🫠

15

22

253

Asaf Levy אסף לוי اسف ليڤي

@AsafLevyHUJI

1 year

I am excited to share our work "Systematic discovery of antibacterial and antifungal bacterial toxins" that is published in @NatureMicrobiol https://t.co/bSqF1RUD99

15

90

355

Cedric C.S. Tan

@cedriccstan

1 year

Link to preprint: https://t.co/LgHjp1Stuk 20/

0

Cedric C.S. Tan

@cedriccstan

1 year

Many thanks and congratulations to my co-authors: Marina Escalera-Zamudio, Alexei Yavlinsky, @LucyvanDorp and @BallouxFrancois! 19/

1

0

Cedric C.S. Tan

@cedriccstan

1 year

Nevertheless, our results highlight the value of using intrahost dynamics to predict mutation success, which we think can be easily ported to other pathogen systems. 18/

1

0

Cedric C.S. Tan

@cedriccstan

1 year

Our current models are far from perfect, so maybe we could include other evolutionary, immunological and epidemiological predictors of mutation fitness. 17/

1

0

1

Cedric C.S. Tan

@cedriccstan

1 year

Overall, we show that the intrahost diversity of viral infections, when combined with other genetic and phenotypic effects, could be used to predict the future fitness of mutations. 16/

1

0

Cedric C.S. Tan

@cedriccstan

1 year

And indeed, adding genetic linkage into our models improved their predictions, especially for the fitter mutations. The SHAP analyses confirm that our linkage predictors are the reason for this improvement. 15/

1

0

Cedric C.S. Tan

@cedriccstan

1 year

We thought that perhaps the missing link is genetic linkage, or the co-occurrence of mutations, which in some cases may boost their fitness (i.e., epistasis). 14/

1

0

Cedric C.S. Tan

@cedriccstan

1 year

However, we noticed that the prediction errors for our models tended to be higher for the fitter mutations, suggesting that perhaps our models are still missing something... 13/

1

0

Cedric C.S. Tan

@cedriccstan

1 year

The patterns of other physiochemical and phenotypic predictors also reflect important evolutionary concepts, but I won't go into detail here. Importantly, the model interpretation analyses suggest that our models were picking up on biologically relevant patterns. 12/

1

0

Cedric C.S. Tan

@cedriccstan

1 year

Based on the SHAP analysis, our models predict higher fitness values for mutations with higher max. intrahost frequencies. In other words, mutations with a high intrahost frequency are also more likely to be fitter in the future. 11/

1

0

Cedric C.S. Tan

@cedriccstan

1 year

To understand what our models were actually learning, we employed the SHAP model explanation framework developed by Lundberg et al. Maximum intrahost frequency was consistently the most important feature. 10/

1

0

Cedric C.S. Tan

@cedriccstan

1 year

Our models performed pretty well for each timeframe (r2=0.53-0.68), even when we trained models on one timeframe and tested on another (r2=0.52), suggesting that the data patterns learnt by our models are highly generalisable. 9/

1

0

Cedric C.S. Tan

@cedriccstan

1 year

The predictors of mutation fitness used: 1. Intrahost: as derived from intrahost frequencies. 2. Physiochemical: charge, mol. weight, hydrophobicity, BLOSUM62 score. 3. RBD phenotypes: binding, expression, antibody escape (by @jbloom_lab). 8/

1

0

Cedric C.S. Tan

@cedriccstan

1 year

To formally test whether intrahost freq. patterns were a good predictor of success, we trained we trained separate XGBoost regression models to forecast the fitness of intrahost mutations measured after each sampling timeframe (i.e., future fitness), one for each timeframe. 7/

1

0

1

Cedric C.S. Tan

@cedriccstan

1 year

We found that the intrahost mutations that eventually become highly successful only reach peak frequency in GISAID genomes at a median of 6-40 months after the timeframes of our datasets. 6/

1

0

Cedric C.S. Tan

@cedriccstan

1 year

To test this, we curated and analysed the intrahost mutation frequencies of ~8000 SARS-CoV-2 sequencing libraries. These libraries represent random samples of SARS-CoV-2 infections collected across seven distinct sampling timeframes in the pandemic. 5/

1

0