
xAlg-ai
@xalg_ai
Followers
6
Following
0
Media
5
Statuses
8
Accelerating AI algorithmically | UC, Berkeley Sky Computing Lab.
Berkeley, CA
Joined October 2025
Excited to share our new research: vAttention - Verified Sparse Attention. Sparse attention with provable quality guarantees for LLMs. Full paper: https://t.co/pvOSEI8E7J Gibhub: xAlg-ai/sparse-attention-hub 🧵 A thread 👇
arxiv.org
State-of-the-art sparse attention methods for reducing decoding latency fall into two main categories: approximate top-$k$ (and its extension, top-$p$) and recently introduced sampling-based...
1
5
11
Also, our blog is available on SIGOPS!
Barbarians at The Gate: How AI is Upending Systems Research by @audreyccheng, @LynnLiu41887950, @melissapan, @istoica05, and the @ai4research_ucb team, https://t.co/b6vtMJN3Et This's first article of the The Next Horizon of System Intelligence blog series.
0
3
11
This is a joint work out of Sky Computing Lab, UC Berkely @BerkeleySky @istoica05 @matei_zaharia @profjoeyg @Alex_Cuadron @luisgschroeder @randwalk0 @kumarkagrawal @Apd10Desai
0
0
1
5/N vAttention when combined with a good top-k predictor can get significantly lower approximation errors and thus model quality at small token budgets.
1
0
1
4/N Even Denominator-guarantee approximation, which is cheaper to compute, gives extremely high correlation with specified tolerance and empirically observed errors
1
0
1
3/N How it works: 1️⃣ Find potentially “heavy-hitter” tokens -- including sink , local window and approximate top-k 2️⃣ Sample the rest randomly with a budget determined by CLT to ensure that (ε, δ) guarantees on attention output
1
0
1
2/N vAttention introduces verified sparse attention — you choose error tolerance (ε, δ), and it ensures approximation error stays within bounds with high probability.
1
0
1
(1/N) LLMs choke on long contexts because full attention with long sequence length applies immense memory pressure. Sparse attention has the promise, but the current approaches do not deliver. They cannot adapt to the variety of query and head specific attention patterns
1
0
1