Johannes Linder @jjohlin X Profile

Johannes Linder

@jjohlin

Followers

264

Following

19

Media

4

Statuses

25

Biology + Machine Learning.

Joined April 2018

Don't wanna be here? Send us removal request.

Johannes Linder

@jjohlin

10 months

(Rather than the models being a cross-fold ensemble, they were trained with identical train/test splits and only the random weight initialization and batched sequence order varied.) 8/

0

1

Johannes Linder

@jjohlin

10 months

Finally, we apologize for the confusion and wasted time early users experienced due to a training script bug that caused us to misstate the train/valid/test splits of the model in the preprint. The published manuscript and github now accurately describe the split. 7/

1

0

2

Johannes Linder

@jjohlin

10 months

- Flashzoi ( https://t.co/5FGuGawnPx) : Efficient borzoi in pytorch. - Decima ( https://t.co/0X0wBt417X) : Transfer-learning to single-cell atlas expression data. - gReLU ( https://t.co/raLnBFWTZl) : Software suite for training, interpretation, design. 6/

1

0

2

Johannes Linder

@jjohlin

10 months

Second, a number of excellent follow-up tools & research by other groups has emerged, using the pre-trained Borzoi model weights as backbone. Here is a (possibly incomplete) list of them: - scooby ( https://t.co/OKN7vk6Ojv) : Transfer-learning to single-cell multiome data. 5/

1

0

1

Johannes Linder

@jjohlin

10 months

This highlights the importance of all the publicly available resources we used for training, and we want to give a shoutout to those consortia. ENCODE ( https://t.co/WL3xjX8iY5), GTEx ( https://t.co/pV7xmfe6QI), FANTOM ( https://t.co/kXgZOrNE76), CatLAS ( https://t.co/b4aHenjNiY) 4/

1

0

Johannes Linder

@jjohlin

10 months

In the revision, we performed ablation experiments where Borzoi was re-retrained to predict RNA-seq coverage alone, without any auxiliary data it was originally trained with (e.g. DNase). As it turns out, the model’s generalization performance for RNA-seq drops noticeably. 3/

1

0

1

Johannes Linder

@jjohlin

10 months

Original tweet (preprint): https://t.co/LSrAq56QMX There are a couple of developments since the preprint worth highlighting. 2/

David Kelley

@drklly

2 years

Check our new paper “Predicting RNA-seq coverage from DNA sequence as a unifying model of gene regulation”.

1

0

Johannes Linder

@jjohlin

10 months

The Borzoi manuscript is now out in Nature Genetics: https://t.co/e6WcfztXx3 Borzoi predicts RNA-seq profiles in many tissues & cell types from DNA sequence as its only input. With it, we can score the impact of genetic variants on a number of gene-regulatory functions. 1/

nature.com

Nature Genetics - Borzoi adapts the Enformer sequence-to-expression model to directly predict RNA-seq coverage, enabling the in-silico analysis of variant effects across multiple layers of gene...

2

44

173

David Kelley

@drklly

2 years

Check our new paper “Predicting RNA-seq coverage from DNA sequence as a unifying model of gene regulation”.

biorxiv.org

Sequence-based machine learning models trained on genome-scale biochemical assays improve our ability to interpret genetic variants by providing functional predictions describing their impact on the...

11

127

445

Rahul Satija

@satijalab

3 years

New preprint w/@anshulkundaje introducing CPA-Perturb-seq! We systematically perturb regulators of cleavage and polyadenylation, and explore post-transcriptional changes at single-cell resolution. Led by @mh_kowalski @harm__w and @jjohlin (🧵) https://t.co/GQwKMQhlpX

4

82

302

Georg Seelig

@seeliglab

3 years

Very excited for MP3-seq, a new high-throughput Y2H approach we use to screen de novo protein heterodimer interactions. Fantastic work by Alex Baryshev, Alyssa La Fleur, @benjaminbgroves,@CirstynMichel, David Baker @UWproteindesign, @AjasjaLjubetic https://t.co/5JPKw5Ly2m

0

54

203

David Kelley

@drklly

3 years

Excited to highlight @Calico’s 2023 summer internship program, which my group will be participating in! If you’re interested in gaining experience with deep learning models in regulatory genomics, consider applying to join us here:

calicolabs.com

5

26

84

Nature Methods

@naturemethods

3 years

CaRPool-seq from @satijalab, @nevillesanjana and colleagues makes use of the RNA-targeting CRISPR-Cas13d system to perform combinatorial perturbations in single-cell screens. https://t.co/4AiuMSkR2H

0

33

129

Genome Biology

@GenomeBiology

3 years

@vagar112 & @drklly describe Saluki, which is capable of predicting the effects of mRNA sequences and genetic variants on mRNA stability 50% more accurately relative to existing models in mammals. https://t.co/gfvVB6YiT1

0

5

11

Johannes Linder

@jjohlin

3 years

In this peer-reviewed version of the paper, @SamanthaKoplik experimentally assayed clinically relevant 3’ UTR variants in an MPRA in multiple cell lines and validated many of our predictions.

0

1

4

Johannes Linder

@jjohlin

3 years

APARENT2, our latest model for scoring and interpreting the effects of variants on 3’ UTR polyadenylation, was published in Genome Biology: https://t.co/9GtahLK5mo. Great collaboration with @seeliglab, @anshulkundaje and @SamanthaKoplik.

genomebiology.biomedcentral.com

Background 3′-end processing by cleavage and polyadenylation is an important and finely tuned regulatory process during mRNA maturation. Numerous genetic variants are known to cause or contribute to...

2

20

69

Johannes Linder

@jjohlin

4 years

Presenting this work at #bog22 Thursday May 12th 9am ET!

0

1

Johannes Linder

@jjohlin

4 years

Figure 5: When performing in-silico mutagenesis of all human polyA signals, we find that loss-of-polyA is depleted among common variants. In contrast, we detect an enrichment of gain-of-polyA mutations in individuals with Autism in WGS cohort data. (4/4)

1

2

1

Johannes Linder

@jjohlin

4 years

Figure 3: APARENT2’s variant predictions correlate strongly with 3’ aQTL effect sizes from GTEx. By learning residual models of tissue-specific regulation from endogenous data, we can better predict tissue-specific aQTLs. (3/4)

1

3

2

Johannes Linder

@jjohlin

4 years

Figure 2: A deep residual NN called APARENT2 predicts polyA variant effect sizes measured in MPRAs more accurately than previous models. Mask-based attribution allows us to catalogue epistatic feature interactions disrupted by clinically relevant mutations. (2/4)

1

2