Dmitry Penzar Profile
Dmitry Penzar

@dmitrypenzar

Followers
515
Following
5K
Media
53
Statuses
534

PhD in bioinformatics, ML researcher, teacher

Joined May 2016
Don't wanna be here? Send us removal request.
@dawnchenx
Dawn Chen
5 days
Announcing our new preprint! We built SPICE, a framework that combines large-scale experiments and generative AI to design RNA sequences that control cell type-specific gene expression using alternative splicing - a powerful new modality! (1/10) Preprint:
Tweet card summary image
biorxiv.org
Programmable control of gene expression in specific cell types is essential for both basic discovery and therapeutic intervention, yet current strategies lack scalability across diverse cellular...
2
18
90
@dmitrypenzar
Dmitry Penzar
9 days
Now we just need to find those scientists. I’d start with Bluesky — maybe find one specialist to make their newsfeed a bit less of a mess
@MatthewBJane
Matthew B. Jané
11 days
Not everything needs to be a peer-reviewed academic paper.
0
0
2
@STLChessClub
Saint Louis Chess Club
22 days
We are deeply saddened by the unexpected passing of Grandmaster Daniel Naroditsky. Daniel was not only a friend of the Saint Louis Chess Club, but a gifted player, educator, and beloved pillar of the chess community. His passion for the game and commitment to teaching inspired
44
247
3K
@krishras23
Krish Rastogi
22 days
We lost an educator, a player, a coach, and one of the best speed chess players of this generation. Rest in peace Naroditsky, we all miss you. I'm also going to leave this clip here, it speaks for itself.
@VBkramnik
Vladimir Kramnik
22 days
Seemingly, conflicts with @chesscom, @freestylechess1, both kicking him out from commentator role,had a big impact lately on @GmNaroditsky. Got the stream episodes. Not a doctor but looks like something "very else" than sleeping pills. Hope,if any, real friends of him will care
47
403
7K
@stevenyuyy
Steven Yu
25 days
Glad you like it! It's the nano-protein-viewer that I built over the summer Really need to work on marketing lol
@DdelAlamo
Diego del Alamo
25 days
@stevenyuyy Whoa what plugin is this? So much prettier than protein viewer
5
18
164
@dmitrypenzar
Dmitry Penzar
29 days
Great story
@BiologyAIDaily
Biology+AI Daily
1 month
Protein Language Models are Accidental Taxonomists 1. A new study revealing a significant issue in protein-protein interaction (PPI) prediction models. These models, which use protein language models (pLMs), have been found to exploit phylogenetic distances rather than genuine
0
0
1
@RDasLab
Das Lab
2 months
The results are in: top codes in Stanford #RNA 3D Folding @kaggle are competitive with CASP16-leading humans Vfold, beat AlphaFold 3. Top team’s trick was template-based modeling, not #DeepLearning. Congrats: john, odat, Eigen, + all 1706 participants! https://t.co/EgzN3DTKNe
1
14
76
@dmitrypenzar
Dmitry Penzar
2 months
While I still not convinced Shorkie really need pretraining stuff and the same result can't be achieved by carefull selection of hyperparameters, I'm clearly amazed by the clarity of work done and honest comparison (not always in favor of Shorkie) with MPRA-trained models
@anshulkundaje
Anshul Kundaje (anshulkundaje@bluesky)
2 months
2 cool papers on sequence-to-gene expression models in yeast https://t.co/6ozI9gSb6x (pretrains a fungal DNALM -> fine tunes on yeast expression & ChIP-exo profiles) https://t.co/yLErrIjIS1 (directly trains on expression profiles) Both use modified Borzoi architectures 1/
0
0
0
@anshulkundaje
Anshul Kundaje (anshulkundaje@bluesky)
2 months
Another thing that is maybe less emphasized in this paper is that CLINVAR is a great database of curated pathogenic/benign variants but it is extremely biased (in all sorts of ways) & should never be used as a representative benchmark dataset for most types of variants. 1/
@BrandesNadav
Nadav Brandes
2 months
Latest genomic AI models report near-perfect prediction of pathogenic variants (e.g. AUROC>0.97 for Evo2). We ran extensive independent evals and found these figures are true, but very misleading. A breakdown of our new preprint: 🧵
1
21
166
@pkoo562
Peter Koo
2 months
2025 Machine Learning in Computational Biology (#MLCB) meeting starts TODAY (9/10) at 9:30 (EST)! We have a great lineup of keynotes, contributed talks, and posters today and tomorrow! Schedule: https://t.co/wN8z3SeD8Y Join for free via livestream:
Tweet card summary image
mlcb.org
The in-person component will be held at the New York Genome Center, 101 6th Ave, New York, NY 10013. All times below are Eastern Time.
1
11
61
@BrandesNadav
Nadav Brandes
2 months
It’s basically Simpson's paradox. To illustrate what’s happening, let’s look at Evo2 for splice & 5’UTR variants. Neither group shows good separation between pathogenic & benign variants, but splice variants get more damaging predictions & are much more likely to be pathogenic.
1
1
16
@anshulkundaje
Anshul Kundaje (anshulkundaje@bluesky)
2 months
The benchmark task is "batch correction" while preserving biological variation. This task was supposedly benchmarked in all these foundation model papers. But they r apparently very poor even for batch correction? What is going on?!?
6
1
28
@lpachter
Lior Pachter
2 months
In a new work with @Josephmrich and Conrad Oakes we tackle the problem of how to best organize alluvial plots. We formalize two optimization problems and develop a solution for them based on the neighbornet algorithm, implemented in the program wompwomp: https://t.co/njQRkjYHNh
2
19
64
@dmitrypenzar
Dmitry Penzar
2 months
9
21
267
@jmschreiber91
Jacob Schreiber
3 months
In the genomics community, we have focused pretty heavily on achieving state-of-the-art predictive performance. While undoubtedly important, how we *use* these models after training is potentially even more important. tangermeme v1.0.0 is out now. Hope you find it useful!
3
23
97
@jmschreiber91
Jacob Schreiber
3 months
An excellent post about the receptive range of convolution models. "You might reasonably ask: "If I have 100 layers with W=1000W=1000, that's a theoretical receptive field of 100,000 tokens. Doesn't that matter?" The answer is no, and here's why:" https://t.co/X1xDNudVZh
guangxuanx.com
Modern LLMs use sliding window attention for efficiency, but why can't stacking sliding windows see as far as theory suggests? A mathematical exploration of information dilution and the exponential...
1
2
18
@sarahgurev
Sarah Gurev
3 months
🚨New paper 🚨 Can protein language models help us fight viral outbreaks? Not yet. Here’s why 🧵👇 1/12
1
37
155
@dmitrypenzar
Dmitry Penzar
3 months
One can just check Phenformer roc-aucs for many diseases in supplements ( https://t.co/jM8CBKRrID) Keeping in mind that the model is evaluated in the most friendly setting (split without accounting for population structure)
Tweet card summary image
arxiv.org
Understanding how molecular changes caused by genetic variation drive disease risk is crucial for deciphering disease mechanisms. However, interpreting genome sequences is challenging because of...
@anshulkundaje
Anshul Kundaje (anshulkundaje@bluesky)
3 months
Utterly uninformed take. We don't know how to pinpoint causal variants accurately for polygenic phenotypes (AD, Diabetes etc.) & good luck editing 100s/1000s of variants in embryos without understanding their pleiotropic effects, oh & ignore those off-target effects.
0
1
1