Aman Patel @amanpatel100 X Profile

Aman Patel

@amanpatel100

Followers

90

Following

38

Media

2

Statuses

27

CS PhD Student @ Stanford, interested in ML approaches for DNA sequence modeling and the study of regulatory evolution

https://t.co/tsoddMc0WK

Joined October 2016

Don't wanna be here? Send us removal request.

Anshul Kundaje

@anshulkundaje

10 days

@amanpatel100 is a fantastic CS grad student graduating ~March 2026, interested in AIxBio industry positions. Has deep expertise in DNA/bio language models, sequence-to-function models, popgen/evolution. Please touch base with him (link in next message) if u have positions 1/

3

13

87

Arpita Singhal

@arpi_ta_s

1 year

Excited to announce DART-Eval, our latest work on benchmarking DNALMs! Catch us at #NeurIPS!

Aman Patel

@amanpatel100

1 year

(1/10) Excited to announce our latest work! @arpi_ta_s, @AustinWang14, and I will be presenting DART-Eval, a rigorous suite of evals for DNA Language Models on transcriptional regulatory DNA at #NeurIPS2024. Check it out!

0

4

6

Austin Wang

@austintwang

1 year

Excited to announce DART-Eval, a set of robust DNA Language Model evaluations!

Aman Patel

@amanpatel100

1 year

(1/10) Excited to announce our latest work! @arpi_ta_s, @AustinWang14, and I will be presenting DART-Eval, a rigorous suite of evals for DNA Language Models on transcriptional regulatory DNA at #NeurIPS2024. Check it out!

0

4

Aman Patel

@amanpatel100

1 year

(10/10) Come check out our poster (tomorrow at 11 AM) or read the paper for more details! https://t.co/YDe4RMCrhQ https://t.co/TTKnMQ94jP https://t.co/yOyIKBv5sU #machinelearning #NeurIPS2024 #genomics

github.com

Contribute to kundajelab/DART-Eval development by creating an account on GitHub.

0

1

3

Aman Patel

@amanpatel100

1 year

(9/10) How do we train more effective DNALMs? Use better data and objectives: • Nailing short-context tasks before long-context • Data sampling to account for class imbalance • Conditioning on cell type context These strategies use external annotations, which are plentiful!

1

0

Aman Patel

@amanpatel100

1 year

(8/10) This indicates that DNALMs inconsistently learn functional DNA. We believe that the culprit is not architecture, but rather the sparse and imbalanced distribution of functional DNA elements. Given their resource requirements, current DNALMs are a hard sell.

1

0

Aman Patel

@amanpatel100

1 year

(7/10) DNALMs struggle with more difficult tasks. Furthermore, small models trained from scratch (<10M params) routinely outperform much larger DNALMs (>1B params), even after LoRA fine-tuning! Our results on the hardest task - counterfactual variant effect prediction.

1

0

Aman Patel

@amanpatel100

1 year

(6/10) We introduce DART-Eval, a suite of five biologically informed DNALM evaluations focusing on transcriptional regulatory DNA ordered by increasing difficulty.

1

0

Aman Patel

@amanpatel100

1 year

(5/10) Rigorous evaluations of DNALMs, though critical, are lacking. Existing benchmarks: • Focus on surrogate tasks tenuously related to practical use cases • Suffer from inadequate controls and other dataset design flaws • Compare against outdated or inappropriate baselines

1

0

Aman Patel

@amanpatel100

1 year

(4/10) An effective DNALM should: • Learn representations that can accurately distinguish different types of functional DNA elements • Serve as a foundation for downstream supervised models • Outperform models trained from scratch

1

0

Aman Patel

@amanpatel100

1 year

(3/10) However, DNA is vastly different from text, being much more heterogeneous, imbalanced, and sparse. Imagine a blend of several different languages interspersed with a load of gibberish.

1

0

Aman Patel

@amanpatel100

1 year

(2/10) DNALMs are a new class of self-supervised models for DNA, inspired by the success of LLMs. These DNALMs are often pre-trained solely on genomic DNA without considering any external annotations.

1

0

Aman Patel

@amanpatel100

1 year

(1/10) Excited to announce our latest work! @arpi_ta_s, @AustinWang14, and I will be presenting DART-Eval, a rigorous suite of evals for DNA Language Models on transcriptional regulatory DNA at #NeurIPS2024. Check it out!

arxiv.org

Recent advances in self-supervised models for natural language, vision, and protein sequences have inspired the development of large genomic DNA language models (DNALMs). These models aim to learn...

2

6

27

Jacob Schreiber

@jmschreiber91

2 years

Thrilled to announce that I'll be joining the incredible researchers at @IMPvienna for a year as a visiting scientist and then joining @UMassChan as an assistant professor in Genomics+CompBio in 2025! At both places, I'll be continuing my work on deep learning + genomics.

61

16

354

Eric Nguyen

@exnx

2 years

Is DNA all you need? Introducing Evo, a long context 7B foundation model for biology Evo has SOTA *zero-shot* prediction across DNA, RNA, and protein modalities Evo can generate DNA, RNA+proteins & make CRISPR-Cas systems for first time blog https://t.co/5xzGhJPfnK

9

149

679

Joe Biden

@JoeBiden

5 years

America, I’m honored that you have chosen me to lead our great country. The work ahead of us will be hard, but I promise you this: I will be a President for all Americans — whether you voted for me or not. I will keep the faith that you have placed in me.