Anmol Mekala @anmol_mekala X Profile

Anmol Mekala

@anmol_mekala

Followers

40

Following

141

Media

5

Statuses

34

AI eng @ Salient | LLM unlearning & benchmarking research | CS @umassamherst, @iitbombay | Formerly @Microsoft

San Francisco, CA

Joined July 2023

Don't wanna be here? Send us removal request.

Anmol Mekala

@anmol_mekala

3 months

📢 New Paper 📢.Struggling to fit in very long contexts on your LLM? Considering 4-bit quantization to 2x your context window?. Prior work says 4-bit is “good enough,” but at long-context tasks it can drop 16%: with up to 59% drops on specific models❗❗.Details in 🧵👇

4

14

36

Anmol Mekala

@anmol_mekala

15 days

Accepted to EMNLP 2025!!.

Anmol Mekala

@anmol_mekala

3 months

📢 New Paper 📢.Struggling to fit in very long contexts on your LLM? Considering 4-bit quantization to 2x your context window?. Prior work says 4-bit is “good enough,” but at long-context tasks it can drop 16%: with up to 59% drops on specific models❗❗.Details in 🧵👇

0

1

Anmol Mekala

@anmol_mekala

3 months

RT @rishanthrajendh: Long-form factuality metrics like FactScore and VeriScore are accurate but slow (~100s/response): they split text into….

0

9

0

Anmol Mekala

@anmol_mekala

3 months

RT @selini0: We went from "RL without external rewards" to "RL with any rewards" in less than 6 hours hahaha. Interesting times https://t.c….

0

30

0

Anmol Mekala

@anmol_mekala

3 months

RT @corbtt: New paper! We used GRPO to train Qwen 2.5 on 32 randomly-generated Coq programs that don't compile, and it learned to prove the….

0

19

0

Anmol Mekala

@anmol_mekala

3 months

RT @MohitIyyer: 4bit quantization works fine with short contexts but can really hurt with longer ones! Check out our paper for more details….

0

9

0

Anmol Mekala

@anmol_mekala

3 months

📜 Does quantization affect models’ performance on long-context tasks? (.Work @UMass_NLP by @aatmakuru6 and myself, guided by @yixiao_song, @mar_kar_ & @MohitIyyer.

arxiv.org

Large language models (LLMs) now support context windows exceeding 128K tokens, but this comes with significant memory requirements and high inference latency. Quantization can mitigate these...

0

1

2

Anmol Mekala

@anmol_mekala

3 months

Takeaways:.✅ 8-bit quants are consistently robust. ⚠️ Be cautious with 4-bit: especially on long contexts 📚 and multilingual 🌍 tasks. 🚫 Testing a few tasks or models isn’t enough: performance on one doesn’t guarantee others.

1

2

Anmol Mekala

@anmol_mekala

3 months

Quantization effects vary dramatically across models: even similarly sized ones:.Qwen-2.5 72B remains robust under BNB-nf4 (+0.6%) on OneRuler, while Llama-3.1 70B sees a massive 59% drop 📉📉 on the same task!.Evaluating a single model family isn’t enough❗️❗️

1

0

2

Anmol Mekala

@anmol_mekala

3 months

Long-context tasks (up to 128K): Ruler, OneRuler & NoCha show quantization losses rise with longer contexts and in multilingual settings. Unlike long-context, long-form generation does not show large drops upon quantization.

1

0

2

Anmol Mekala

@anmol_mekala

3 months

🔹8-bit quantization (FP8, GPTQ-int8) maintains near-perfect accuracy (<0.9% avg drop). 🔸But 4-bit quantizations (AWQ-int4, GPTQ-int4, BNB-nf4) can degrade sharply on very long contexts📉

1

0

2

Anmol Mekala

@anmol_mekala

3 months

We benchmark five quantization methods (FP8, GPTQ-int8, AWQ-int4, GPTQ-int4, BNB-nf4) across five models (Llama-3.1/Qwen-2.5; 7–72B) on 10K examples from five long-context 📚🔍 (up to 128K tokens) and long-form generation tasks 📚✍️.

1

0

2

Anmol Mekala

@anmol_mekala

5 months

RT @kenziyuliu: An LLM generates an article verbatim—did it “train on” the article?. It’s complicated: under n-gram definitions of train-se….

0

96

0

Anmol Mekala

@anmol_mekala

5 months

RT @vaidehi_patil_: 🚨Exciting @icmlconf workshop alert 🚨. We’re thrilled to announce the #ICML2025 Workshop on Machine Unlearning for Gener….

0

19

0

Anmol Mekala

@anmol_mekala

5 months

RT @mar_kar_: We have updated #nocha, a leaderboard for reasoning over long-context narratives 📖, with some new models including #Gemini 2.….

0

6

0

Anmol Mekala

@anmol_mekala

5 months

RT @SketchesbyBoze: the use of chatbots to write essays is a five-alarm fire with the power to destroy education, but we can defeat it easi….

0

106

0

Anmol Mekala

@anmol_mekala

6 months

RT @ZhiyuanZeng_: Is a single accuracy number all we can get from model evals?🤔.🚨Does NOT tell where the model fails.🚨Does NOT tell how to….

0

92

0

Anmol Mekala

@anmol_mekala

6 months

RT @rohitgandikota: Why do distilled diffusion models generate similar-looking images? 🤔. Our Diffusion Target (DT) visualization reveals t….

0

74

0

Anmol Mekala

@anmol_mekala

6 months

RT @goyalsachin007: Realization (again) from research over the past 2 months: A solid open-source framework from “reliable” folks isn’t jus….

0

2

0

Anmol Mekala

@anmol_mekala

6 months

RT @WeijiaShi2: Another great work by @pratyushmaini. Excited to see our machine unlearning benchmark, MUSE (🔗, n….

arxiv.org

Language models (LMs) are trained on vast amounts of text data, which may include private and copyrighted content. Data owners may request the removal of their data from a trained model due to...

0

6

0