Adithya Bhaskar Profile
Adithya Bhaskar

@AdithyaNLP

Followers
300
Following
135
Media
23
Statuses
50

Second Year CS Ph.D. student at Princeton University (@princeton_nlp), previously CS undergrad at IIT Bombay

Princeton, NJ
Joined June 2023
Don't wanna be here? Send us removal request.
@AdithyaNLP
Adithya Bhaskar
1 year
Ever wished circuit finding was more precise, efficient, and scalable? Now it is!.In our new preprint, we propose Edge Pruning, a conceptually simple yet effective way to find circuits in models. Details in 🧵!.Work done with @_awettig @danfriedman0 @danqi_chen 1/6
Tweet media one
1
21
122
@AdithyaNLP
Adithya Bhaskar
5 days
RT @ChengleiSi: Are AI scientists already better than human researchers?. We recruited 43 PhD students to spend 3 months executing research….
0
162
0
@AdithyaNLP
Adithya Bhaskar
12 days
Paper also has (1) ablation & sensitivity studies (2) PruLong for pretraining (3) more idealized & real (hardware) metrics!. Paper: Code: Special thanks to my coauthors @_awettig @YiheS5 @gaotianyu1350 @danqi_chen!. 7/7.
0
0
4
@AdithyaNLP
Adithya Bhaskar
12 days
Our modifications substantially reduce the critical KV footprint needed to retain 90% performance of the two methods by up to 30 absolute percentage points, when evaluated on long -> short (HELMET) as well as long -> long benchmarks (LongProc). 6/7
Tweet media one
1
0
2
@AdithyaNLP
Adithya Bhaskar
12 days
We also build on the SoTA recency eviction method, DuoAttention, to create PruLong. We replace LASSO with hard-concrete pruning, swap its reconstruction objective for next-token prediction, and use naturally occurring long-context data instead of synthetic sequences. 5/7.
1
0
2
@AdithyaNLP
Adithya Bhaskar
12 days
We then amend existing methods to achieve lower KV footprints. For two postfill eviction methods (PyramidKV and SnapKV) using attention to discard KVs, we propose chunked eviction (eviction after prefill -> eviction after every chunk of prefill). 4/7.
1
0
2
@AdithyaNLP
Adithya Bhaskar
12 days
We propose to measure the "KV footprint" = fraction of KV cache entries that have not been evicted, aggregated across timesteps. No point talking about KV footprint without performance, hence comes ā€œcritical footprintā€: the minimal KV footprint to retain 90% of performance. 3/7
Tweet media one
1
0
2
@AdithyaNLP
Adithya Bhaskar
12 days
Generation = prefill (forward pass over input and saving KVs) + postfill (decoding 1 output token at a time). Some papers accelerate the prefill; others neglect it and minimize the memory overhead of postfilling. Some focus on throughput while others optimize memory usage. 2/7
Tweet media one
1
0
2
@AdithyaNLP
Adithya Bhaskar
12 days
There are many KV cache-reduction methods, but a fair comparison is challenging. We propose a new unified metric called ā€œcritical KV footprintā€. We compare existing methods and propose a new one - PruLong, which ā€œprunesā€ certain attn heads to only look at local tokens. 1/7
Tweet media one
2
33
227
@AdithyaNLP
Adithya Bhaskar
21 days
RT @xiye_nlp: šŸ¤” Recent mech interp work showed that retrieval heads can explain some long-context behavior. But can we use this insight for….
0
17
0
@AdithyaNLP
Adithya Bhaskar
2 months
RT @cindy_x_wu: Introducing COMPACT: COMPositional Atomic-to-complex Visual Capability Tuning, a data-efficient approach to improve multimo….
0
44
0
@AdithyaNLP
Adithya Bhaskar
6 months
RT @cindy_x_wu: Want to train large vision-language models but drowning in data? Introducing ICONS - we demonstrat….
0
62
0
@AdithyaNLP
Adithya Bhaskar
6 months
RT @tyleryzhu: Have you ever wondered why we don’t use multiple visual encoders for VideoLLMs? We thought the same! . Excited to announce o….
0
12
0
@AdithyaNLP
Adithya Bhaskar
7 months
Paper: Code:
0
0
2
@AdithyaNLP
Adithya Bhaskar
7 months
I'll be at NeurIPS 2024 to present Edge Pruning (spotlight; link in thread)! Please stop by my poster at 11 am PST on the 13th (East Exhibit Hall A-C)!. Also excited to chat about anything (but especially interpretability, reasoning, and preference optimization) - please DM!
Tweet media one
2
6
41
@AdithyaNLP
Adithya Bhaskar
9 months
RT @noamrazin: Past work observed that DPO often decreases the probability of preferred responses. So where does the probability go? 🧐. We….
0
15
0
@AdithyaNLP
Adithya Bhaskar
9 months
I’m at COLM from Monday to Wednesday. Reach out if you want to chat!.
1
1
9
@AdithyaNLP
Adithya Bhaskar
11 months
RT @HowardYen1: Come check out our poster at #ACL2024! I will be at the 4pm poster session, stop by to chat about long-context models https….
0
4
0
@AdithyaNLP
Adithya Bhaskar
11 months
I'll be at ACL 2024!.I'd love to chat with about interpretability, preference optimization, science of LM, or any NLP topics -- feel free to reach out!.Oh, and I'll present The Heuristic Core ( both as an oral (Aug 13 10:30) and a poster (Aug 12 14:00).
3
4
48
@AdithyaNLP
Adithya Bhaskar
1 year
RT @danfriedman0: How can we understand neural chatbots in terms of interpretable, symbolic mechanisms? To explore this question, we constr….
0
30
0
@AdithyaNLP
Adithya Bhaskar
1 year
RT @SadhikaMalladi: My new blog post argues from first principles how length normalization in preference learning objectives (e.g., SimPO)….
0
22
0