m_karasikov Profile Banner
Mikhail Karasikov Profile
Mikhail Karasikov

@m_karasikov

Followers
119
Following
60
Media
8
Statuses
36

ML Engineer developing vision foundation models. PhD in CS, genome graphs, compressed data structures

Zurich
Joined November 2020
Don't wanna be here? Send us removal request.
@akkah21
Andre Kahles (@[email protected])
25 days
After years of research and continuous refinement, we’re thrilled to share that our paper on the MetaGraph framework — enabling Petabase-scale search across sequencing data — has been published today in Nature ( https://t.co/WQgDjIYDZL).
Tweet card summary image
nature.com
Nature - MetaGraph enables scalable indexing of large sets of DNA, RNA or protein sequences using annotated de Bruijn graphs.
1
25
61
@m_karasikov
Mikhail Karasikov
29 days
Glad this resonated with the committee, and grateful to @sebastianffx for presenting it last week at MICCAI 2025. Hopefully, it sparked some insightful discussions and ideas!
0
1
1
@m_karasikov
Mikhail Karasikov
6 months
Happy to share that this work was early accepted for #MICCAI2025 and will be presented this September in Daejeon, South Korea. It will also be outlined at #ECDP2025 this June in Barcelona.
1
0
1
@m_karasikov
Mikhail Karasikov
7 months
Achieving top results with little data suggests that current algorithms don't fully exploit the information in truly large data sets. I tend to think that there still remains huge unrealized potential, and new algorithms are needed to bring this to the next level.
1
0
0
@m_karasikov
Mikhail Karasikov
7 months
We also explored post-training techniques to further boost the FMs and pushed our model to top-1.
1
0
0
@m_karasikov
Mikhail Karasikov
7 months
How much data is enough for training a SOTA-level pathology Foundation Model? In our new work https://t.co/rTUvygSBAx, we show that all recent models are close, and even 12k WSIs from TCGA are enough to outperform most of them. /w @JoostvDoorn @Huugie76 @sebastianffx and the team
1
2
6
@nomad421
𝕐
1 year
Really nice writeup on the hugely impressive MetaGraph work by @gxr, @akkah21, @m_karasikov, @HarunMustafa416 (& others whose handles I don't know) in @ScienceMagazine: https://t.co/3h3uCgWsaU. Some comments by @ZaminIqbal, Lesley Hoyles and myself! Congrats @m_karasikov & team!
Tweet card summary image
science.org
Achievement demonstrates feasibility of making all of life’s code easily searchable, researchers say
4
19
58
@ISBSIB
SIB
2 years
“A major step to making DNA sequencing data accessible to wider audiences.” 🥁 That's what the committee said about this work, one of the #SIBRemarkableOutputs 2022 👏 👉 Find out more: https://t.co/Mh7BbERQCz #genomics @m_karasikov
1
2
4
@curious_coding
Ragnar {Groot Koerkamp} 🦋
3 years
After half a year of "next week", "tomorrow", and "today", I'm glad that our (@peshotrie and me) preprint on exact global alignment is finally online! This thread visually summarizes our paper and next steps. 1/9 https://t.co/IH1KajQpsf
@peshotrie
Pesho Ivanov 🇺🇦
3 years
@curious_coding and I extended the seed heuristic to exact alignment of long (Mbps) erroneous (≤15%) sequences. The empyrical near-linear runtime makes our aligner A*PA 250x faster than Edlib and WFA on synthetic data, and looks promising on human data. https://t.co/YmqdAOml32
3
7
43
@m_karasikov
Mikhail Karasikov
3 years
Awesome project! Grateful for the chance to make my humble contribution as well. The data is indexed with MetaGraph and ready for search and alignment: 6.4M genomes -> 15 GB index with k-mer coordinates (CountingDBG); all 318M assembled scaffolds -> 124 GB index, 121 bln k-mers.
@SunagawaLab
Microbiome Research
3 years
Delighted to share our latest publication on the ‘biosynthetic potential of the global ocean microbiome’ in @Nature https://t.co/b4kgOLIROD. If you want to know, have a look at this video:
0
1
10
@giulio_pibiri
Giulio Ermanno Pibiri
4 years
I've created a "Crash Course on Data Compression" that I'm going to teach next week to PhD students in Pisa (20h, 5 modules). 1/ Link: https://t.co/8pNqxvZHcN #compression #DataScience #coding
Tweet card summary image
github.com
🗜 💻 A crash course on Data Compression. Contribute to jermp/data_compression_course development by creating an account on GitHub.
3
22
102
@dominik_kempa
Dominik Kempa
4 years
If you are interested in learning how compression can be used to speed up algorithms and design smaller data structures, join our online workshop "Compression + Computation" on 01/19 (Wednesday), 10am-6pm EST (registration is free but required to join)! https://t.co/MYDY1J4qqL
sites.google.com
Overview Many modern applications produce massive datasets containing a lot of redundancy, either in the form of highly skewed frequencies or repeating motifs/fragments of identical data. Prominent...
3
10
47
@m_karasikov
Mikhail Karasikov
4 years
For encoding RNA expression levels, it can also be turned into a kmer-count dictionary - 8x smaller than state-of-the-art and yet much faster to query.
0
0
5
@m_karasikov
Mikhail Karasikov
4 years
To demonstrate the new opportunities, we designed a sequence-to-graph alignment algorithm on top of Counting de Bruijn graphs, with a modified backtracking stage ensuring the consistency with sequences encoded in the graph (traces) - by @HarunMustafa416
1
0
1
@m_karasikov
Mikhail Karasikov
4 years
The method encodes traces in the underlying DBG, playing a similar role as gPBWT in variation graphs. One of the crucial differences is in the coding technique: while gPBWT encodes each path by storing the "turns", Counting DBG applies a delta-like coding on global coordinates.
1
0
2
@m_karasikov
Mikhail Karasikov
4 years
We call this data structure a Counting De Bruijn graph. On average, the compression is even higher than with gzip - only 0.54 bits/bp for long HiFi reads.
1
0
2
@m_karasikov
Mikhail Karasikov
4 years
Additionally, we apply a delta-like coding extending the RowDiff scheme https://t.co/ADZ0KLCtPj (would be impossible without @danieldanciu) which computes a delta between the original annotation at each node and its predicted/expected value reconstructed from the successor nodes.
1
0
2
@m_karasikov
Mikhail Karasikov
4 years
The general idea is to decompose the annotation matrix into a sparse binary indicator matrix and dense vectors of attributes encoded separately. This decomposition allows directly applying existing schemes for a compressed representation of binary matrices and arrays.
1
0
2
@m_karasikov
Mikhail Karasikov
4 years
The k-mer coordinates are then stored in a special succinct representation.
1
0
1