Abhinav Adduri Profile
Abhinav Adduri

@abhinadduri

Followers
235
Following
216
Media
2
Statuses
105

Machine Learning Research Scientist @ArcInstitute. Computer Science Ph.D. @CarnegieMellon.

Joined October 2013
Don't wanna be here? Send us removal request.
@abhinadduri
Abhinav Adduri
3 days
Check it out: Model for embedding:
0
0
4
@abhinadduri
Abhinav Adduri
3 days
when using bf16-mixed, always double check your checkpoints on disk were not autocasted. in other news, we patched this in state v0.9.9 for ~8-9x faster embedding with SE-600M, with no loss in quality 😃. thanks @fleetwood___ for first noticing the checkpoint was in fp32
Tweet media one
1
0
10
@abhinadduri
Abhinav Adduri
4 days
RT @phil_fradkin: We're excited to introduce our new work on mature mRNA property prediction, co-first authored with the amazing @ianshi3 a….
0
8
0
@abhinadduri
Abhinav Adduri
4 days
0
0
1
@abhinadduri
Abhinav Adduri
4 days
I really enjoyed @NikoMcCarty's piece with Hani.@genophoria. "AlphaFold is important because it says something about a protein's function by predicting its structure. Similarly, if two cells have the same gene expression patterns, then they probably have the same function,.
1
1
13
@abhinadduri
Abhinav Adduri
8 days
I guess this would work well for when input sequences have very non-uniform information content. excited to hack on this.
0
0
1
@abhinadduri
Abhinav Adduri
8 days
A new method for learning tokenization - dynamically learning how to chunk seems interesting for dna models, which may not always need to use capacity on single-nucleotide resolution. Need to think more deeply on this for other bio modalities 🤔🤔.
@sukjun_hwang
Sukjun (June) Hwang
8 days
Tokenization has been the final barrier to truly end-to-end language models. We developed the H-Net: a hierarchical network that replaces tokenization with a dynamic chunking process directly inside the model, automatically discovering and operating over meaningful units of data
Tweet media one
Tweet media two
1
0
4
@abhinadduri
Abhinav Adduri
9 days
@arcinstitute challenge website:
0
0
1
@abhinadduri
Abhinav Adduri
9 days
Our work with the STATE model tackled context generalization for *seen* perturbations. @arcinstitute's Virtual Cell Challenge (VCC) focuses on a more challenging scenario: context generalization for *unseen* perturbations. Colab notebook to train your own STATE models for VCC:.
3
5
54
@abhinadduri
Abhinav Adduri
10 days
RT @pdhsu: Our Virtual Cell Challenge commentary is one of the most read @CellCellPress articles for the last month, alongside Weinberg's c….
0
7
0
@abhinadduri
Abhinav Adduri
10 days
We updated the State Embedding 600M checkpoint on the @ArcInstitute Hugging Face. This model was trained with 4x FLOPs compared to the preprint model. It achieves significantly lower val/loss and does better on internal evals - would recommend using this over the 4 epoch one for
Tweet media one
1
9
52
@abhinadduri
Abhinav Adduri
12 days
RT @ElliotHershberg: This is one of the core questions I tried to answer in my recent essay on virtual cells. If we are still far from mod….
0
4
0
@abhinadduri
Abhinav Adduri
15 days
RT @sokrypton: CASP is getting cut by NIH. 😢. (Anyone with extra funds wanna help support perhaps the most important competition of the c….
0
126
0
@abhinadduri
Abhinav Adduri
21 days
RT @dlwh: So about a month ago, Percy posted a version of this plot of our Marin 32B pretraining run. We got a lot of feedback, both public….
0
100
0
@abhinadduri
Abhinav Adduri
22 days
Note that a lot of genome scale perturb seq datasets are also in this regime.
0
1
4
@abhinadduri
Abhinav Adduri
22 days
Folks should read this excellent work by @probablybots. Contextualized is a multi task learning framework that learns population heterogeneity. This makes it useful for inferring sample specific GRNs, and more generally, for learning in the "high # contexts, few samples per.
@probablybots
Caleb Ellington
2 months
Honored to share a major thread of my PhD research, out now in PNAS. We address a core issue with how models are used for scientific discovery. Models are so important that they define the entire scientific process. 1/n
Tweet media one
1
1
11
@abhinadduri
Abhinav Adduri
22 days
twitter is an interesting platform. everyone just talks science. I am late to the party 😂.
2
0
3
@abhinadduri
Abhinav Adduri
22 days
What this suggests is that foundational, high quality datasets can help us learn foundation models of perturbation biology - and even help us transfer to different modalities.
0
0
5
@abhinadduri
Abhinav Adduri
22 days
Thanks for the shoutout @ElliotHershberg! This was one of the most exciting parts for us as well. We also found that when using cell embeddings, pre-training on Tahoe-100M improved our zeroshot transfer on genetic or signaling datasets!.
@ElliotHershberg
Elliot Hershberg
22 days
The result I'm most excited about from Arc's new State model:. The ability to generalize on zero-shot out-of-distribution predictions after pre-training on the TAHOE-100M data set. Whereas PLMs have seemingly benefitted less from scaling data and model size, this is an inkling
Tweet media one
1
3
12