Alex Chen
@itisalex3
Followers
7
Following
7
Media
8
Statuses
11
CS Undergrad Researcher (AI/ML) @ UCLA
Joined October 2025
What happens when we compress the KV cache of prompts with multiple instructions? 🤔 Existing compression methods can lead to some instructions being ignored. 🙀 We propose simple changes to KV cache eviction that fix this problem alongside other pitfalls to be aware of. 💯
2
2
16
(1/8) We introduce Sparse-LaViDa, a new framework that accelerates the inference speed of unified multi-modal diffusion language models via a novel sparse parameterization. It achieves up to 2.8x speed up on tasks including image generation, editing, and visual math reasoning.
11
90
534
"An hour of planning can save you 10 hours of doing." ✨📝 Planned Diffusion 📝 ✨ makes a plan before parallel dLLM generation. Planned Diffusion runs 1.2-1.8× faster than autoregressive and an order of magnitude faster than diffusion, while staying within 0.9–5% AR quality.
7
47
316
Read the paper for more details! Done in collaboration with an awesome team: @danielmisrael, @renatogeh, and advisors @guyvdb and @adityagrover_ Project website: https://t.co/Ax42eQdOnC Paper: https://t.co/fwMpCEy088 Github:
github.com
Repository for the paper: https://arxiv.org/abs/2510.00231 - Itisalex2/pitfalls-of-kv-cache-compression
0
0
1
We propose Fair Eviction Policies: forcing each instruction to lose KV entries at equal rates. Similarly to whitelisting, fair eviction is able to lessen the degradation of defense at only a small cost to directive degradation.
1
0
0
Eviction corresponding to the wrong tokens can play a critical role in degradation. Whitelisting key defensive phrases dramatically reduces leakage with almost no cost to directive following.
1
0
0
Interestingly, when changing the order of the defense and directive, i.e. writing the system prompt with a defense prompt first (or second) and directive second (or first), the degradation pattern/eviction of directive following and leakage radically changes.
1
0
0
We use system prompt leakage as a multi-instruction case study. System prompts consist of a) defense and b) system directive. Defense prevents prompt leakage. Directive contains instructions to answer user queries. We see that leakage occurs before directive following degrades.
1
0
0
Degradation rates also depend heavily on the KV cache compression method and model. Different methods (StreamingLLM, H2O, SnapKV, K-norm, TOVA) produce completely different failure modes even at the same ratio.
1
0
0
On the IFEval dataset, different instructions can degrade at different rates. We argue that this is driven by 1) hardness and 2) eviction bias, where eviction policies can biasly evict more entries of certain instructions when compressing mult-instruction prompts.
1
0
0
KV cache compression promises memory savings, lower latency, and higher throughput, for a negligible performance cost. We argue that performance cost is poorly understood. KV Cache diagram from: https://t.co/HEM98oIsO9
1
0
0