
Abhinav Adduri
@abhinadduri
Followers
235
Following
216
Media
2
Statuses
105
Machine Learning Research Scientist @ArcInstitute. Computer Science Ph.D. @CarnegieMellon.
Joined October 2013
when using bf16-mixed, always double check your checkpoints on disk were not autocasted. in other news, we patched this in state v0.9.9 for ~8-9x faster embedding with SE-600M, with no loss in quality 😃. thanks @fleetwood___ for first noticing the checkpoint was in fp32
1
0
10
RT @phil_fradkin: We're excited to introduce our new work on mature mRNA property prediction, co-first authored with the amazing @ianshi3 a….
0
8
0
I really enjoyed @NikoMcCarty's piece with Hani.@genophoria. "AlphaFold is important because it says something about a protein's function by predicting its structure. Similarly, if two cells have the same gene expression patterns, then they probably have the same function,.
1
1
13
A new method for learning tokenization - dynamically learning how to chunk seems interesting for dna models, which may not always need to use capacity on single-nucleotide resolution. Need to think more deeply on this for other bio modalities 🤔🤔.
Tokenization has been the final barrier to truly end-to-end language models. We developed the H-Net: a hierarchical network that replaces tokenization with a dynamic chunking process directly inside the model, automatically discovering and operating over meaningful units of data
1
0
4
Our work with the STATE model tackled context generalization for *seen* perturbations. @arcinstitute's Virtual Cell Challenge (VCC) focuses on a more challenging scenario: context generalization for *unseen* perturbations. Colab notebook to train your own STATE models for VCC:.
3
5
54
RT @pdhsu: Our Virtual Cell Challenge commentary is one of the most read @CellCellPress articles for the last month, alongside Weinberg's c….
0
7
0
We updated the State Embedding 600M checkpoint on the @ArcInstitute Hugging Face. This model was trained with 4x FLOPs compared to the preprint model. It achieves significantly lower val/loss and does better on internal evals - would recommend using this over the 4 epoch one for
1
9
52
RT @ElliotHershberg: This is one of the core questions I tried to answer in my recent essay on virtual cells. If we are still far from mod….
0
4
0
RT @sokrypton: CASP is getting cut by NIH. 😢. (Anyone with extra funds wanna help support perhaps the most important competition of the c….
0
126
0
RT @dlwh: So about a month ago, Percy posted a version of this plot of our Marin 32B pretraining run. We got a lot of feedback, both public….
0
100
0
Folks should read this excellent work by @probablybots. Contextualized is a multi task learning framework that learns population heterogeneity. This makes it useful for inferring sample specific GRNs, and more generally, for learning in the "high # contexts, few samples per.
Honored to share a major thread of my PhD research, out now in PNAS. We address a core issue with how models are used for scientific discovery. Models are so important that they define the entire scientific process. 1/n
1
1
11
Thanks for the shoutout @ElliotHershberg! This was one of the most exciting parts for us as well. We also found that when using cell embeddings, pre-training on Tahoe-100M improved our zeroshot transfer on genetic or signaling datasets!.
The result I'm most excited about from Arc's new State model:. The ability to generalize on zero-shot out-of-distribution predictions after pre-training on the TAHOE-100M data set. Whereas PLMs have seemingly benefitted less from scaling data and model size, this is an inkling
1
3
12