Anmol Mekala Profile
Anmol Mekala

@anmol_mekala

Followers
40
Following
141
Media
5
Statuses
34

AI eng @ Salient | LLM unlearning & benchmarking research | CS @umassamherst, @iitbombay | Formerly @Microsoft

San Francisco, CA
Joined July 2023
Don't wanna be here? Send us removal request.
@anmol_mekala
Anmol Mekala
3 months
📢 New Paper 📢.Struggling to fit in very long contexts on your LLM? Considering 4-bit quantization to 2x your context window?. Prior work says 4-bit is “good enough,” but at long-context tasks it can drop 16%: with up to 59% drops on specific models❗❗.Details in 🧵👇
Tweet media one
4
14
36
@anmol_mekala
Anmol Mekala
15 days
Accepted to EMNLP 2025!!.
@anmol_mekala
Anmol Mekala
3 months
📢 New Paper 📢.Struggling to fit in very long contexts on your LLM? Considering 4-bit quantization to 2x your context window?. Prior work says 4-bit is “good enough,” but at long-context tasks it can drop 16%: with up to 59% drops on specific models❗❗.Details in 🧵👇
Tweet media one
0
0
1
@anmol_mekala
Anmol Mekala
3 months
RT @rishanthrajendh: Long-form factuality metrics like FactScore and VeriScore are accurate but slow (~100s/response): they split text into….
0
9
0
@anmol_mekala
Anmol Mekala
3 months
RT @selini0: We went from "RL without external rewards" to "RL with any rewards" in less than 6 hours hahaha. Interesting times https://t.c….
0
30
0
@anmol_mekala
Anmol Mekala
3 months
RT @corbtt: New paper! We used GRPO to train Qwen 2.5 on 32 randomly-generated Coq programs that don't compile, and it learned to prove the….
0
19
0
@anmol_mekala
Anmol Mekala
3 months
RT @MohitIyyer: 4bit quantization works fine with short contexts but can really hurt with longer ones! Check out our paper for more details….
0
9
0
@anmol_mekala
Anmol Mekala
3 months
Takeaways:.✅ 8-bit quants are consistently robust. ⚠️ Be cautious with 4-bit: especially on long contexts 📚 and multilingual 🌍 tasks. 🚫 Testing a few tasks or models isn’t enough: performance on one doesn’t guarantee others.
1
1
2
@anmol_mekala
Anmol Mekala
3 months
Quantization effects vary dramatically across models: even similarly sized ones:.Qwen-2.5 72B remains robust under BNB-nf4 (+0.6%) on OneRuler, while Llama-3.1 70B sees a massive 59% drop 📉📉 on the same task!.Evaluating a single model family isn’t enough❗️❗️
Tweet media one
1
0
2
@anmol_mekala
Anmol Mekala
3 months
Long-context tasks (up to 128K): Ruler, OneRuler & NoCha show quantization losses rise with longer contexts and in multilingual settings. Unlike long-context, long-form generation does not show large drops upon quantization.
Tweet media one
1
0
2
@anmol_mekala
Anmol Mekala
3 months
🔹8-bit quantization (FP8, GPTQ-int8) maintains near-perfect accuracy (<0.9% avg drop). 🔸But 4-bit quantizations (AWQ-int4, GPTQ-int4, BNB-nf4) can degrade sharply on very long contexts📉
Tweet media one
1
0
2
@anmol_mekala
Anmol Mekala
3 months
We benchmark five quantization methods (FP8, GPTQ-int8, AWQ-int4, GPTQ-int4, BNB-nf4) across five models (Llama-3.1/Qwen-2.5; 7–72B) on 10K examples from five long-context 📚🔍 (up to 128K tokens) and long-form generation tasks 📚✍️.
Tweet media one
1
0
2
@anmol_mekala
Anmol Mekala
5 months
RT @kenziyuliu: An LLM generates an article verbatim—did it “train on” the article?. It’s complicated: under n-gram definitions of train-se….
0
96
0
@anmol_mekala
Anmol Mekala
5 months
RT @vaidehi_patil_: 🚨Exciting @icmlconf workshop alert 🚨. We’re thrilled to announce the #ICML2025 Workshop on Machine Unlearning for Gener….
0
19
0
@anmol_mekala
Anmol Mekala
5 months
RT @mar_kar_: We have updated #nocha, a leaderboard for reasoning over long-context narratives 📖, with some new models including #Gemini 2.….
0
6
0
@anmol_mekala
Anmol Mekala
5 months
RT @SketchesbyBoze: the use of chatbots to write essays is a five-alarm fire with the power to destroy education, but we can defeat it easi….
0
106
0
@anmol_mekala
Anmol Mekala
6 months
RT @ZhiyuanZeng_: Is a single accuracy number all we can get from model evals?🤔.🚨Does NOT tell where the model fails.🚨Does NOT tell how to….
0
92
0
@anmol_mekala
Anmol Mekala
6 months
RT @rohitgandikota: Why do distilled diffusion models generate similar-looking images? 🤔. Our Diffusion Target (DT) visualization reveals t….
0
74
0
@anmol_mekala
Anmol Mekala
6 months
RT @goyalsachin007: Realization (again) from research over the past 2 months: A solid open-source framework from “reliable” folks isn’t jus….
0
2
0