jmschreiber91 Profile Banner
Jacob Schreiber Profile
Jacob Schreiber

@jmschreiber91

Followers
5K
Following
7K
Media
500
Statuses
5K

Guest Scientist @impvienna, Board of Directors @NumFOCUS, incoming prof @UMassGCB. Prev, @StanfordMed @uwcse. Studying genomics, machine learning, and fruit.

Vienna, Austria
Joined March 2017
Don't wanna be here? Send us removal request.
@jmschreiber91
Jacob Schreiber
5 years
The more papers I read for a review article I'm writing about ML pitfalls in genomics, the more my faith is shaken in the results from papers that apply machine learning to methylation arrays. A salty thread. 1/.
21
220
825
@jmschreiber91
Jacob Schreiber
2 days
that feeling that you're being watched
Tweet media one
0
0
1
@jmschreiber91
Jacob Schreiber
4 days
Walking around Vienna and found Twitter
Tweet media one
0
0
6
@jmschreiber91
Jacob Schreiber
12 days
Super excited to be on the way to #ISMB2025 @ISCB_RegSys! Who else is going?.
1
0
10
@jmschreiber91
Jacob Schreiber
30 days
medium demand expected
Tweet media one
1
0
1
@jmschreiber91
Jacob Schreiber
1 month
RT @jmschreiber91: Today marks the end of en-JUNE-eering, the month where I focused mostly on the nitty gritty of improving genomics ML inf….
0
4
0
@jmschreiber91
Jacob Schreiber
1 month
(5) A new tomtom-lite command-line tool that allows quick querying of motifs without needing to go to the Tomtom website.
@jmschreiber91
Jacob Schreiber
2 months
Last week I shared tomtom-lite, a super fast re-implementation of Tomtom for annotating short genomic spans with the motifs they most resemble. Now, there's a convenient command line tool `ttl` that comes with the installation. You can get it with `pip instal memelite`.
0
0
3
@jmschreiber91
Jacob Schreiber
1 month
(4) bpnet-lite: Load Chrom/BPNet models from the official TensorFlow repos into PyTorch for downstream tangermeme. integration Improved command-line tools + docs. Still concerns about perf of models trained from scratch -- will be resolved next version!.
@jmschreiber91
Jacob Schreiber
1 month
Last week I released bpnet-lite v0.5.0. Chrom/BPNet are powerful models for understanding regulatory genomics from @anshulkundaje's group, and now it's way easier to go from raw data to trained models and analysis/results in PyTorch . Try it out with `pip install bpnet-lite`.
1
0
2
@jmschreiber91
Jacob Schreiber
1 month
(3) tangermeme: significant quality-of-life improvements, fixing an issue with seqlet calling, plotting w/ annotations, and tomtom-lite integration across several functions.
Tweet card summary image
github.com
Biological sequence analysis for the modern age. Contribute to jmschrei/tangermeme development by creating an account on GitHub.
@jmschreiber91
Jacob Schreiber
2 months
Just released tangermeme v0.5.0! . tangermeme implements "everything-but-the-model" for genomic ML Train your model your way using your code-base (or load someone else's model), and tangermeme handles the discovery + design with it. Try it out with `pip install tangermeme`.
1
0
1
@jmschreiber91
Jacob Schreiber
1 month
(2) bam2bw: a simple utility that allows you to go from BAMs -> un/stranded bigWigs without intermediary bedGraph files. Way less memory, disk, and hassle. Now extended to work on fragment files and .tsv/.gz, and depth normalize.
Tweet card summary image
github.com
A command-line tool for reading SAM/BAM files and converted them directly to bigwig files. - jmschrei/bam2bw
@jmschreiber91
Jacob Schreiber
2 months
Released a new version of my little data processing tool, `bam2bw`. As you might expect, it goes directly from BAM to un/stranded bigwig(s), but now also supports fragment.tsv files. No more intermediary sorting and bedGraph steps! Way faster + less disk than before.
4
0
4
@jmschreiber91
Jacob Schreiber
1 month
(1) tomtom-lite: a significantly faster implementation of the original tomtom algorithm that can be over 1000x faster. Now built-in to a variety of my other tools.
Tweet card summary image
github.com
A lightweight reimplementation of some of the algorithms in the MEME suite in Python. - jmschrei/memesuite-lite
@jmschreiber91
Jacob Schreiber
2 months
I wrote a quick application note on Tomtom-lite, a Python implementation of the Tomtom algorithm for comparing PWMs against each other. This implementation can be 10-1000x faster and, as a Python function, can be integrated into your workflows easier.
1
0
2
@jmschreiber91
Jacob Schreiber
1 month
Today marks the end of en-JUNE-eering, the month where I focused mostly on the nitty gritty of improving genomics ML infrastructure. Here are some of the highlights:.
1
4
31
@jmschreiber91
Jacob Schreiber
1 month
Thank you, Google Flights, for recommending this 10 hour layover in Athens first by "convenience" when trying to find a Frankfurt -> Vienna flight.
Tweet media one
0
0
0
@jmschreiber91
Jacob Schreiber
1 month
Has genomics gone too far.
@ShouldHaveCat
Why you should have a cat
1 month
is this real?
Tweet media one
1
0
9
@jmschreiber91
Jacob Schreiber
1 month
This evaluation of DNA design methods is very well written. If you're interested in the field, you should def take a look. Also, glad to see Ledidi performing so well!.
Tweet card summary image
biorxiv.org
One outstanding open problem with high therapeutic value is how to design nucleic acid sequences with specific properties. Even just the 5’ UTR sequence admits 2 × 10120 possibilities, making...
0
8
54
@jmschreiber91
Jacob Schreiber
1 month
(6) Several minor code re-orgs and changes have been added. You can now use any dtype and device for the steps, allowing you to use a CPU if necessary or do half-precision for large-scale prediction.
0
0
0
@jmschreiber91
Jacob Schreiber
1 month
Although there are some challenges with simply mapping seqlets to motif databases, this can be viewed as a fast + dirty alternative to the robust de novo motif discovery of TF-MoDISco. It'll just give you a sense for what your model has learned (if anything at all)!.
1
0
1
@jmschreiber91
Jacob Schreiber
1 month
When running the pipleine, the seqlets will be annotated using tomtom-lite + motif database, and counted so you get the top driving motifs. For example, for a CTCF model, here are the seqlet counts when mapped to JASPAR, with MET28 overlapping one of the fingers in CTCF.
Tweet media one
1
0
1
@jmschreiber91
Jacob Schreiber
1 month
Note that seqlets for negative attributions are trickier than for positive attributions because there are fewer negative attributions and real negative seqlets overall. More work will be done to make this more robust in the future.
1
0
0