Nadav Brandes Profile
Nadav Brandes

@BrandesNadav

Followers
896
Following
396
Media
36
Statuses
154

#ComputationalBiology and #AI

New York, USA
Joined June 2016
Don't wanna be here? Send us removal request.
@BrandesNadav
Nadav Brandes
6 days
We're a small and supportive center. Our director, Aravinda Chakravarti, is not only a renowned geneticist but also an incredible colleague who cares deeply about the success of each of the labs.
0
0
0
@BrandesNadav
Nadav Brandes
6 days
We’re hiring tenure-track faculty in the Center for Human Genetics and Genomics at NYU! If you have ambitious ideas for transforming human genetics, this is the place for you. https://t.co/8KjyUxmk5m
1
0
0
@BrandesNadav
Nadav Brandes
2 months
I finished reading “If Anyone Builds It, Everyone Dies: Why Superhuman AI Would Kill Us All”. I liked this book. It gives an intelligent explanation for why some people are extremely concerned about the rapid progress in AI. It makes the case that the ongoing efforts to build
0
0
6
@BrandesNadav
Nadav Brandes
2 months
I'm really excited about this work from a great student in my lab. It addresses a big gap in the clinical implementation of variant effect predictions, providing well-calibrated pathogenicity probabilities from multiple predictions (without data-expensive meta-predictors).
@PoYu_Lin_NCKUH
Po-Yu Lin
2 months
Machine-learning predictions are widely used to classify genetic variants as pathogenic. A key obstacle is the clinical guideline requiring pre-commitment to one tool. Our approach lifts this restriction, enabling use of multiple tools with complementary strengths.
0
2
16
@BrandesNadav
Nadav Brandes
3 months
This work was done by three incredible students! @PoYu_Lin_NCKUH @BaiyuLu66681 @XueshenLiu
2
0
21
@BrandesNadav
Nadav Brandes
3 months
Lesson: near-perfect prediction of pathogenic variants across ‘all variants’ is an illusion. Variant-type-specific evals are needed to know when models are actually good and useful.
1
3
31
@BrandesNadav
Nadav Brandes
3 months
Model comparison: -- GPN-MSA = most robust DNA model -- AlphaMissense = most robust protein model -- AlphaGenome & Evo2 = strong in some variant types, very unstable in others No single model is best across the board.
1
3
20
@BrandesNadav
Nadav Brandes
3 months
Once you control for variant type, a clearer picture emerges. Reliable performance (AUROC>0.9) is achieved for missense, synonymous, non-splice intron, 3′ UTR & RNA gene variants. By contrast, stop-gain, start-loss, stop-loss, splice & 5′ UTR variants remain difficult.
2
3
20
@BrandesNadav
Nadav Brandes
3 months
To show how serious this is, we included a simple rule-based baseline that only uses variant type information (no sequences, no AI). It achieves AUROC=0.944 across noncoding variants. The reported numbers suddenly look much less impressive.
2
6
63
@BrandesNadav
Nadav Brandes
3 months
So when the two groups are merged, almost all pathogenic variants are splice and almost all benign variants are 5’UTR. You end up with almost perfect separation, just because the model knows to assign more damaging predictions to splice variants.
1
0
20
@BrandesNadav
Nadav Brandes
3 months
It’s basically Simpson's paradox. To illustrate what’s happening, let’s look at Evo2 for splice & 5’UTR variants. Neither group shows good separation between pathogenic & benign variants, but splice variants get more damaging predictions & are much more likely to be pathogenic.
1
1
16
@BrandesNadav
Nadav Brandes
3 months
Measuring model performance by variant type reveals a big anomaly. Evo2, for example, scores AUROC=0.975 on noncoding variants, but much lower on all specific types (e.g. 0.697 for splice, 0.903 for intron, 0.767 for 5′ UTR). Other models show similar pattern. What's going on?
1
1
14
@BrandesNadav
Nadav Brandes
3 months
We created a benchmark of ~250,000 pathogenic & benign variants. Unlike previous benchmarks, we evaluated performance by variant type. We broke down broad categories like ‘noncoding variants’ into specific annotations like intron, 3′ UTR and RNA gene.
1
3
20
@BrandesNadav
Nadav Brandes
3 months
Latest genomic AI models report near-perfect prediction of pathogenic variants (e.g. AUROC>0.97 for Evo2). We ran extensive independent evals and found these figures are true, but very misleading. A breakdown of our new preprint: 🧵
9
118
484
@BrandesNadav
Nadav Brandes
9 months
I sometimes wonder if ChatGPT has a neural circuitry equivalent to eye rolling, activated whenever it decides to play along despite thinking I'm just being an idiot.
0
0
4
@BrandesNadav
Nadav Brandes
9 months
That said, I’m super excited about the possibilities this work opens up. It’s a long paper, and there are parts I haven’t read yet that look really cool. Thanks for releasing this!
0
0
1
@BrandesNadav
Nadav Brandes
9 months
I still think masked LMs (with bidirectional attention) make more sense for genomics than autoregressive models, especially for variant effect prediction. Also still not convinced that Hyena is better than regular transformers. I guess time will tell.
1
0
4
@BrandesNadav
Nadav Brandes
9 months
I also wish the paper provided more clarity on how exactly they extracted variant effect predictions from Evo2 and other LMs. One oddity: ESM1b performs much worse on non-SNV variants in the Evo2 preprint (ROC-AUC=0.8) than in our 2023 paper (ROC-AUC=0.87) with @vntranos and
2
0
0
@BrandesNadav
Nadav Brandes
9 months
Evo2 classifies noncoding variants almost perfectly, which makes me think these are relatively easy cases (e.g. canonical splice sites). The large context size (~8,000 nt) also suggests it’s mostly local effects.
1
0
3