Andre Kahles (@akkah21@genomic.social) @akkah21 X Profile

Andre Kahles (@[email protected])

@akkah21

Followers

157

Following

19

Media

9

Statuses

47

Joined September 2012

Don't wanna be here? Send us removal request.

Andre Kahles (@[email protected])

@akkah21

1 month

We invite you to try out Metagraph at https://t.co/ePQTUNef1p, learn more about our framework in the paper ( https://t.co/WQgDjIYDZL) or start building your own indexes from your own data ( https://t.co/W5vPFPym3T).

github.com

Scalable annotated de Bruijn graphs for DNA indexing, alignment, and assembly - ratschlab/metagraph

0

4

Andre Kahles (@[email protected])

@akkah21

1 month

We would like to thank the bioinformatics community for years of support and openness. A special thanks to the Logan effort, whose contig set we use as input for one of our largest indexes.

1

0

3

Andre Kahles (@[email protected])

@akkah21

1 month

While MetaGraph provides a lossless representation of the input k-mer set, it is not a lossless compression of the raw reads. To reach petabase scale, we remove noisy k-mers prior to indexing — a step that we show has only minimal impact on search sensitivity.

1

0

3

Andre Kahles (@[email protected])

@akkah21

1 month

We show that MetaGraph indexes are both scalable and cost-efficient for querying. We Searching 1 Mbp of sequence against the entire SRA costs less than $1 on standard cloud infrastructure — making Petabase-scale biological data truly searchable and accessible.

1

0

3

Andre Kahles (@[email protected])

@akkah21

1 month

Our indexes support fast exact matching as well as alignment with edits. Labels can represent sample metadata, coordinates or quantification values. We can store 10’000 human transcriptome samples in < 160 GB and return position-wise expression for any queried sequence.

1

0

5

Andre Kahles (@[email protected])

@akkah21

1 month

We have already processed more than 10 Petabases of raw sequence data from the SRA and make the compressed indexes publicly available for search ( https://t.co/Mdhe96Z9Jw), download and cloud-based access.

1

0

4

Andre Kahles (@[email protected])

@akkah21

1 month

At its core, MetaGraph represents all input sequences as labeled, succinct de Bruijn graphs — a highly compressed yet fully searchable structure. Each k-mer carries metadata labels that remain interactively queryable through a flexible API.

1

0

3

Andre Kahles (@[email protected])

@akkah21

1 month

Modern biology produces vast amounts of raw sequencing data — genomes, transcriptomes, and protein sequences. MetaGraph provides a unified computational framework to index, query, and reason across this landscape of biological information.

1

0

3

Andre Kahles (@[email protected])

@akkah21

1 month

The following thread describes the main ideas and results of this joint work with @gxr @m_karasikov @HarunMustafa416 @adamant_pwn

1

0

3

Andre Kahles (@[email protected])

@akkah21

1 month

After years of research and continuous refinement, we’re thrilled to share that our paper on the MetaGraph framework — enabling Petabase-scale search across sequencing data — has been published today in Nature ( https://t.co/WQgDjIYDZL).

nature.com

Nature - MetaGraph enables scalable indexing of large sets of DNA, RNA or protein sequences using annotated de Bruijn graphs.

1

25

63

Rayan Chikhi

@RayanChikhi

1 year

Today we’re excited to freely share an early-version of, perhaps, the world’s most expansive genetics dataset: Logan. #bioinformatics #petabase #genetics #genomics #openscience https://t.co/CQSBmvn7se

biorxiv.org

The NCBI Sequence Read Archive (SRA) is the largest public repository of DNA sequencing data, containing the most comprehensive snapshot of Earth’s genetic diversity to date. As its size exceeds 50.0...

6

139

321

Andre Kahles (@[email protected])

@akkah21

1 year

Ready for MetaSUB Global City Sampling Day 2024 in Zürich, Switzerland 🦠#metasub #gcsd2024 #urbanmicrobiome @metasub

0

4

13

Giulio Ermanno Pibiri

@giulio_pibiri

1 year

Happy to share the result of a very fruitful collaboration with @curious_coding -- the "mod-minimizer". Excited even more for what is yet to come! https://t.co/6XzCoJhY6A 🧵 1/10

biorxiv.org

Motivation: Given a string S, a minimizer scheme is an algorithm defined by a triple (k,w,O) that samples a subset of k-mers (k-long substrings) from a string S. Specifically, it samples the smallest...

2

22

41

Giulio Ermanno Pibiri

@giulio_pibiri

2 years

Really excited about this work in progress with @curious_coding! A simple idea the turns out to work well in practice (it saves space already when plugged into SSHash) and can be easily proved to almost match the lower bound for randomized schemes. Still, many questions ahead! :)

Ragnar {Groot Koerkamp} 🦋

@curious_coding

2 years

Working more on minimizers now with @giulio_pibiri. So much fun! We can now do better than miniception for both small k (green) and large k (purple). This also breaks Schleimer's lower bound for random schemes. 1/3

0

1

10

Andre Kahles (@[email protected])

@akkah21

2 years

Last chance to send your abstract to the International Genome Graph Symposium 2024 in Ascona! More info at https://t.co/pUdfChMqAQ Keynotes include @ArangRhie @BenLangmead @CamilleMrcht @EimearEKenny @ewanbirney @jasmijnbaaijens @MakovaLab @marinkazitnik @ZaminIqbal #iggsy24

0

4

9

Andre Kahles (@[email protected])

@akkah21

2 years

Registration for the International Genome Graph Symposium 2024 is now open at https://t.co/pUdfChMqAQ. Join us in Ascona for a great lineup of speakers and come to present your work. #iggsy24

0

8

16

CAMDA 2025

@CAMDA_conf

3 years

Join leading researchers at #CAMDA'23! Top keynotes by Karsten Borgwardt on predicting medical complications by #MachineLearning & Edward Feil on antimicrobial resistance! Submit your best by 18 May 👉 https://t.co/mf4V6zWioa @maxplanckpress @kmborgwardt @EpicFeil_ #AMR #OneHealth

0

9

11

Amir Joudaki

@AmirJoudaki

3 years

🎉Happy to share our paper on using long seed sketches for alignment has been published in @genomeresearch! A big thank you to our amazing collaborators @alexmeterez, @akkah21, and @gxr Check out the paper:

0

7

22

Andre Kahles (@[email protected])

@akkah21

3 years

Looking for a postdoc to work on algorithms for selective long-read sequencing - please share: https://t.co/aLkW34O9z0

0

3

4

Pesho Ivanov 🇺🇦

@peshotrie

3 years

@curious_coding and I extended the seed heuristic to exact alignment of long (Mbps) erroneous (≤15%) sequences. The empyrical near-linear runtime makes our aligner A*PA 250x faster than Edlib and WFA on synthetic data, and looks promising on human data. https://t.co/YmqdAOml32

3

20

69