Dmitry Penzar
@dmitrypenzar
Followers
515
Following
5K
Media
53
Statuses
534
PhD in bioinformatics, ML researcher, teacher
Joined May 2016
(1/8)The LegNet paper is finally published https://t.co/ROOParhEwq. Congrats to @halfacrocodile @WWenya @DariaNogina @ZinkevichA
academic.oup.com
AbstractMotivation. The increasing volume of data from high-throughput experiments including parallel reporter assays facilitates the development of comple
4
16
60
Announcing our new preprint! We built SPICE, a framework that combines large-scale experiments and generative AI to design RNA sequences that control cell type-specific gene expression using alternative splicing - a powerful new modality! (1/10) Preprint:
biorxiv.org
Programmable control of gene expression in specific cell types is essential for both basic discovery and therapeutic intervention, yet current strategies lack scalability across diverse cellular...
2
18
90
Now we just need to find those scientists. I’d start with Bluesky — maybe find one specialist to make their newsfeed a bit less of a mess
0
0
2
We are deeply saddened by the unexpected passing of Grandmaster Daniel Naroditsky. Daniel was not only a friend of the Saint Louis Chess Club, but a gifted player, educator, and beloved pillar of the chess community. His passion for the game and commitment to teaching inspired
44
247
3K
We lost an educator, a player, a coach, and one of the best speed chess players of this generation. Rest in peace Naroditsky, we all miss you. I'm also going to leave this clip here, it speaks for itself.
Seemingly, conflicts with @chesscom, @freestylechess1, both kicking him out from commentator role,had a big impact lately on @GmNaroditsky. Got the stream episodes. Not a doctor but looks like something "very else" than sleeping pills. Hope,if any, real friends of him will care
47
403
7K
Glad you like it! It's the nano-protein-viewer that I built over the summer Really need to work on marketing lol
@stevenyuyy Whoa what plugin is this? So much prettier than protein viewer
5
18
164
Colleagues have also pointed out this paper
academic.oup.com
Abstract. Identifying protein–protein interactions (PPIs) is crucial for deciphering biological pathways. Numerous prediction methods have been developed a
0
0
3
The results are in: top codes in Stanford #RNA 3D Folding @kaggle are competitive with CASP16-leading humans Vfold, beat AlphaFold 3. Top team’s trick was template-based modeling, not #DeepLearning. Congrats: john, odat, Eigen, + all 1706 participants! https://t.co/EgzN3DTKNe
1
14
76
While I still not convinced Shorkie really need pretraining stuff and the same result can't be achieved by carefull selection of hyperparameters, I'm clearly amazed by the clarity of work done and honest comparison (not always in favor of Shorkie) with MPRA-trained models
2 cool papers on sequence-to-gene expression models in yeast https://t.co/6ozI9gSb6x (pretrains a fungal DNALM -> fine tunes on yeast expression & ChIP-exo profiles) https://t.co/yLErrIjIS1 (directly trains on expression profiles) Both use modified Borzoi architectures 1/
0
0
0
Another thing that is maybe less emphasized in this paper is that CLINVAR is a great database of curated pathogenic/benign variants but it is extremely biased (in all sorts of ways) & should never be used as a representative benchmark dataset for most types of variants. 1/
Latest genomic AI models report near-perfect prediction of pathogenic variants (e.g. AUROC>0.97 for Evo2). We ran extensive independent evals and found these figures are true, but very misleading. A breakdown of our new preprint: 🧵
1
21
166
2025 Machine Learning in Computational Biology (#MLCB) meeting starts TODAY (9/10) at 9:30 (EST)! We have a great lineup of keynotes, contributed talks, and posters today and tomorrow! Schedule: https://t.co/wN8z3SeD8Y Join for free via livestream:
mlcb.org
The in-person component will be held at the New York Genome Center, 101 6th Ave, New York, NY 10013. All times below are Eastern Time.
1
11
61
It’s basically Simpson's paradox. To illustrate what’s happening, let’s look at Evo2 for splice & 5’UTR variants. Neither group shows good separation between pathogenic & benign variants, but splice variants get more damaging predictions & are much more likely to be pathogenic.
1
1
16
The benchmark task is "batch correction" while preserving biological variation. This task was supposedly benchmarked in all these foundation model papers. But they r apparently very poor even for batch correction? What is going on?!?
6
1
28
In a new work with @Josephmrich and Conrad Oakes we tackle the problem of how to best organize alluvial plots. We formalize two optimization problems and develop a solution for them based on the neighbornet algorithm, implemented in the program wompwomp: https://t.co/njQRkjYHNh
2
19
64
Excited to release our study on the emergence of new promoters in random vs genomic DNA. Posting the thread on the other place :) https://t.co/QpzSAmrMky
biorxiv.org
Promoters are DNA sequences that help to initiate transcription. Point mutations can create de-novo promoters, which can consequently transcribe inactive genes or create novel transcripts. We know...
3
18
103
In the genomics community, we have focused pretty heavily on achieving state-of-the-art predictive performance. While undoubtedly important, how we *use* these models after training is potentially even more important. tangermeme v1.0.0 is out now. Hope you find it useful!
3
23
97
An excellent post about the receptive range of convolution models. "You might reasonably ask: "If I have 100 layers with W=1000W=1000, that's a theoretical receptive field of 100,000 tokens. Doesn't that matter?" The answer is no, and here's why:" https://t.co/X1xDNudVZh
guangxuanx.com
Modern LLMs use sliding window attention for efficiency, but why can't stacking sliding windows see as far as theory suggests? A mathematical exploration of information dilution and the exponential...
1
2
18
🚨New paper 🚨 Can protein language models help us fight viral outbreaks? Not yet. Here’s why 🧵👇 1/12
1
37
155
One can just check Phenformer roc-aucs for many diseases in supplements ( https://t.co/jM8CBKRrID) Keeping in mind that the model is evaluated in the most friendly setting (split without accounting for population structure)
arxiv.org
Understanding how molecular changes caused by genetic variation drive disease risk is crucial for deciphering disease mechanisms. However, interpreting genome sequences is challenging because of...
Utterly uninformed take. We don't know how to pinpoint causal variants accurately for polygenic phenotypes (AD, Diabetes etc.) & good luck editing 100s/1000s of variants in embryos without understanding their pleiotropic effects, oh & ignore those off-target effects.
0
1
1