Johannes Linder
@jjohlin
Followers
264
Following
19
Media
4
Statuses
25
(Rather than the models being a cross-fold ensemble, they were trained with identical train/test splits and only the random weight initialization and batched sequence order varied.) 8/
0
0
1
Finally, we apologize for the confusion and wasted time early users experienced due to a training script bug that caused us to misstate the train/valid/test splits of the model in the preprint. The published manuscript and github now accurately describe the split. 7/
1
0
2
- Flashzoi ( https://t.co/5FGuGawnPx) : Efficient borzoi in pytorch. - Decima ( https://t.co/0X0wBt417X) : Transfer-learning to single-cell atlas expression data. - gReLU ( https://t.co/raLnBFWTZl) : Software suite for training, interpretation, design. 6/
1
0
2
Second, a number of excellent follow-up tools & research by other groups has emerged, using the pre-trained Borzoi model weights as backbone. Here is a (possibly incomplete) list of them: - scooby ( https://t.co/OKN7vk6Ojv) : Transfer-learning to single-cell multiome data. 5/
1
0
1
This highlights the importance of all the publicly available resources we used for training, and we want to give a shoutout to those consortia. ENCODE ( https://t.co/WL3xjX8iY5), GTEx ( https://t.co/pV7xmfe6QI), FANTOM ( https://t.co/kXgZOrNE76), CatLAS ( https://t.co/b4aHenjNiY) 4/
1
0
0
In the revision, we performed ablation experiments where Borzoi was re-retrained to predict RNA-seq coverage alone, without any auxiliary data it was originally trained with (e.g. DNase). As it turns out, the model’s generalization performance for RNA-seq drops noticeably. 3/
1
0
1
Original tweet (preprint): https://t.co/LSrAq56QMX There are a couple of developments since the preprint worth highlighting. 2/
Check our new paper “Predicting RNA-seq coverage from DNA sequence as a unifying model of gene regulation”.
1
0
0
The Borzoi manuscript is now out in Nature Genetics: https://t.co/e6WcfztXx3 Borzoi predicts RNA-seq profiles in many tissues & cell types from DNA sequence as its only input. With it, we can score the impact of genetic variants on a number of gene-regulatory functions. 1/
nature.com
Nature Genetics - Borzoi adapts the Enformer sequence-to-expression model to directly predict RNA-seq coverage, enabling the in-silico analysis of variant effects across multiple layers of gene...
2
44
173
Check our new paper “Predicting RNA-seq coverage from DNA sequence as a unifying model of gene regulation”.
biorxiv.org
Sequence-based machine learning models trained on genome-scale biochemical assays improve our ability to interpret genetic variants by providing functional predictions describing their impact on the...
11
127
445
New preprint w/@anshulkundaje introducing CPA-Perturb-seq! We systematically perturb regulators of cleavage and polyadenylation, and explore post-transcriptional changes at single-cell resolution. Led by @mh_kowalski @harm__w and @jjohlin (🧵) https://t.co/GQwKMQhlpX
4
82
302
Very excited for MP3-seq, a new high-throughput Y2H approach we use to screen de novo protein heterodimer interactions. Fantastic work by Alex Baryshev, Alyssa La Fleur, @benjaminbgroves,@CirstynMichel, David Baker @UWproteindesign, @AjasjaLjubetic
https://t.co/5JPKw5Ly2m
0
54
203
Excited to highlight @Calico’s 2023 summer internship program, which my group will be participating in! If you’re interested in gaining experience with deep learning models in regulatory genomics, consider applying to join us here:
calicolabs.com
5
26
84
CaRPool-seq from @satijalab, @nevillesanjana and colleagues makes use of the RNA-targeting CRISPR-Cas13d system to perform combinatorial perturbations in single-cell screens. https://t.co/4AiuMSkR2H
0
33
129
@vagar112 & @drklly describe Saluki, which is capable of predicting the effects of mRNA sequences and genetic variants on mRNA stability 50% more accurately relative to existing models in mammals. https://t.co/gfvVB6YiT1
0
5
11
In this peer-reviewed version of the paper, @SamanthaKoplik experimentally assayed clinically relevant 3’ UTR variants in an MPRA in multiple cell lines and validated many of our predictions.
0
1
4
APARENT2, our latest model for scoring and interpreting the effects of variants on 3’ UTR polyadenylation, was published in Genome Biology: https://t.co/9GtahLK5mo. Great collaboration with @seeliglab, @anshulkundaje and @SamanthaKoplik.
genomebiology.biomedcentral.com
Background 3′-end processing by cleavage and polyadenylation is an important and finely tuned regulatory process during mRNA maturation. Numerous genetic variants are known to cause or contribute to...
2
20
69
Figure 5: When performing in-silico mutagenesis of all human polyA signals, we find that loss-of-polyA is depleted among common variants. In contrast, we detect an enrichment of gain-of-polyA mutations in individuals with Autism in WGS cohort data. (4/4)
1
2
1
Figure 3: APARENT2’s variant predictions correlate strongly with 3’ aQTL effect sizes from GTEx. By learning residual models of tissue-specific regulation from endogenous data, we can better predict tissue-specific aQTLs. (3/4)
1
3
2
Figure 2: A deep residual NN called APARENT2 predicts polyA variant effect sizes measured in MPRAs more accurately than previous models. Mask-based attribution allows us to catalogue epistatic feature interactions disrupted by clinically relevant mutations. (2/4)
1
2
2