Aman Patel
@amanpatel100
Followers
90
Following
38
Media
2
Statuses
27
CS PhD Student @ Stanford, interested in ML approaches for DNA sequence modeling and the study of regulatory evolution
Joined October 2016
@amanpatel100 is a fantastic CS grad student graduating ~March 2026, interested in AIxBio industry positions. Has deep expertise in DNA/bio language models, sequence-to-function models, popgen/evolution. Please touch base with him (link in next message) if u have positions 1/
3
13
87
Excited to announce DART-Eval, our latest work on benchmarking DNALMs! Catch us at #NeurIPS!
(1/10) Excited to announce our latest work! @arpi_ta_s, @AustinWang14, and I will be presenting DART-Eval, a rigorous suite of evals for DNA Language Models on transcriptional regulatory DNA at #NeurIPS2024. Check it out!
0
4
6
Excited to announce DART-Eval, a set of robust DNA Language Model evaluations!
(1/10) Excited to announce our latest work! @arpi_ta_s, @AustinWang14, and I will be presenting DART-Eval, a rigorous suite of evals for DNA Language Models on transcriptional regulatory DNA at #NeurIPS2024. Check it out!
0
4
4
(10/10) Come check out our poster (tomorrow at 11 AM) or read the paper for more details! https://t.co/YDe4RMCrhQ
https://t.co/TTKnMQ94jP
https://t.co/yOyIKBv5sU
#machinelearning #NeurIPS2024 #genomics
github.com
Contribute to kundajelab/DART-Eval development by creating an account on GitHub.
0
1
3
(9/10) How do we train more effective DNALMs? Use better data and objectives: • Nailing short-context tasks before long-context • Data sampling to account for class imbalance • Conditioning on cell type context These strategies use external annotations, which are plentiful!
1
0
0
(8/10) This indicates that DNALMs inconsistently learn functional DNA. We believe that the culprit is not architecture, but rather the sparse and imbalanced distribution of functional DNA elements. Given their resource requirements, current DNALMs are a hard sell.
1
0
0
(7/10) DNALMs struggle with more difficult tasks. Furthermore, small models trained from scratch (<10M params) routinely outperform much larger DNALMs (>1B params), even after LoRA fine-tuning! Our results on the hardest task - counterfactual variant effect prediction.
1
0
0
(6/10) We introduce DART-Eval, a suite of five biologically informed DNALM evaluations focusing on transcriptional regulatory DNA ordered by increasing difficulty.
1
0
0
(5/10) Rigorous evaluations of DNALMs, though critical, are lacking. Existing benchmarks: • Focus on surrogate tasks tenuously related to practical use cases • Suffer from inadequate controls and other dataset design flaws • Compare against outdated or inappropriate baselines
1
0
0
(4/10) An effective DNALM should: • Learn representations that can accurately distinguish different types of functional DNA elements • Serve as a foundation for downstream supervised models • Outperform models trained from scratch
1
0
0
(3/10) However, DNA is vastly different from text, being much more heterogeneous, imbalanced, and sparse. Imagine a blend of several different languages interspersed with a load of gibberish.
1
0
0
(2/10) DNALMs are a new class of self-supervised models for DNA, inspired by the success of LLMs. These DNALMs are often pre-trained solely on genomic DNA without considering any external annotations.
1
0
0
(1/10) Excited to announce our latest work! @arpi_ta_s, @AustinWang14, and I will be presenting DART-Eval, a rigorous suite of evals for DNA Language Models on transcriptional regulatory DNA at #NeurIPS2024. Check it out!
arxiv.org
Recent advances in self-supervised models for natural language, vision, and protein sequences have inspired the development of large genomic DNA language models (DNALMs). These models aim to learn...
2
6
27
Thrilled to announce that I'll be joining the incredible researchers at @IMPvienna for a year as a visiting scientist and then joining @UMassChan as an assistant professor in Genomics+CompBio in 2025! At both places, I'll be continuing my work on deep learning + genomics.
61
16
354
Is DNA all you need? Introducing Evo, a long context 7B foundation model for biology Evo has SOTA *zero-shot* prediction across DNA, RNA, and protein modalities Evo can generate DNA, RNA+proteins & make CRISPR-Cas systems for first time blog https://t.co/5xzGhJPfnK
9
149
679
America, I’m honored that you have chosen me to lead our great country. The work ahead of us will be hard, but I promise you this: I will be a President for all Americans — whether you voted for me or not. I will keep the faith that you have placed in me.
178K
464K
3M
Another chance to RETWEET to win a gift card courtesy of @PapaJohnsHousTx! #OpeningDayAtHome #ForTheH Photo via @PapaJohnsHousTx
21
2K
507
47
4K
1K