Po-Yu Lin
@PoYu_Lin_NCKUH
Followers
7
Following
1
Media
10
Statuses
19
Joined September 2025
We expect P-KNN will help improve diagnostic yield in rare disease genetics and make variant interpretation more robust, flexible, and up to date.
1
0
0
P-KNN is available here: š» Command line tool: https://t.co/xx1lkMWGTR š Precomputed scores for all missense variants in the human genome: https://t.co/aa6MLJQyUY Preprint:
biorxiv.org
Clinical guidelines for Mendelian disease diagnosis require that outputs from variant pathogenicity prediction tools be converted into well-calibrated probabilities. However, the existing calibration...
1
0
0
In summary, P-KNN š¹ Flexibly uses any combination of tools - no pre-commitment required š¹ Delivers stronger, better-calibrated evidence than a calibrated meta-predictor š¹ Fully compatible with the ACMG/AMP Bayesian framework
1
0
0
Computational tools and deep mutational scans both provide pathogenicity scores. ā ļøGuidelines treat them as separate evidence and add their log likelihood ratios, but theyāre not independent, risking miscalibration. ā
PāKNN integrates them into one wellācalibrated probability.
1
0
0
Itās striking how such a simple idea can be so effective - P-KNN consistently turns diverse tool outputs into stronger, well-calibrated evidence.
1
0
0
P-KNN is very flexible: it can work with any set of tools, and it keeps improving as new tools become available. Our latest combination even outperforms AlphaMissense in evidence strength.
1
0
0
Metaāpredictors (e.g., REVEL, BayesDel) also integrate tools, but still need separate calibration. PāKNN combines integration and joint calibration in one step, generating stronger evidence and better calibration.
1
0
0
This simple method works surprisingly well: 1. Delivering stronger evidence, and 2. More accurate, reliable calibration than single-tool approaches.
1
0
0
To overcome these limits, we developed Pathogenicity K-Nearest Neighbors (P-KNN). P-KNN jointly calibrates any set of tools into a single pathogenicity probability. It maps scores into a multi-dimensional space and asks: among nearby variants, what fraction are pathogenic?
1
0
0
To keep probabilities calibrated, clinicians must pre-commit to one predictor, no switching. This creates two issues: 1. Tool choice is unclear: each excels in different cases, no clear guidance. 2. Pre-commitment blocks future insights: no switching during re-analysis.
1
0
0
The existing framework allows calibration of only one tool at a time. Using a calibration dataset with labeled pathogenic and benign variants, each prediction score is calibrated into the proportion of pathogenic variants among the variants with similar scores.
1
0
0
Clinical guidelines require variant pathogenicity predictions to be calibrated into probabilities.
1
0
0
Machine-learning predictions are widely used to classify genetic variants as pathogenic. A key obstacle is the clinical guideline requiring pre-commitment to one tool. Our approach lifts this restriction, enabling use of multiple tools with complementary strengths.
1
0
5
So when the two groups are merged, almost all pathogenic variants are splice and almost all benign variants are 5āUTR. You end up with almost perfect separation, just because the model knows to assign more damaging predictions to splice variants.
0
0
0
Itās basically Simpson's paradox. To illustrate whatās happening, letās look at Evo2 for splice & 5āUTR variants. Neither group shows good separation between pathogenic & benign variants, but splice variants get more damaging predictions & are much more likely to be pathogenic.
1
0
0
Measuring model performance by variant type reveals a big anomaly. Evo2, for example, scores AUROC=0.975 on noncoding variants, but much lower on all specific types (e.g. 0.697 for splice, 0.903 for intron, 0.767 for 5ā² UTR). Other models show similar pattern. What's going on?
1
0
0
We created a benchmark of ~250,000 pathogenic & benign variants. Unlike previous benchmarks, we evaluated performance by variant type. We broke down broad categories like ānoncoding variantsā into specific annotations like intron, 3ā² UTR and RNA gene.
1
0
0