xAlg-ai Profile
xAlg-ai

@xalg_ai

Followers
6
Following
0
Media
5
Statuses
8

Accelerating AI algorithmically | UC, Berkeley Sky Computing Lab.

Berkeley, CA
Joined October 2025
Don't wanna be here? Send us removal request.
@xalg_ai
xAlg-ai
9 days
Excited to share our new research: vAttention - Verified Sparse Attention. Sparse attention with provable quality guarantees for LLMs. Full paper: https://t.co/pvOSEI8E7J Gibhub: xAlg-ai/sparse-attention-hub 🧵 A thread 👇
Tweet card summary image
arxiv.org
State-of-the-art sparse attention methods for reducing decoding latency fall into two main categories: approximate top-$k$ (and its extension, top-$p$) and recently introduced sampling-based...
1
5
11
@ai4research_ucb
AI-Driven Research Systems
2 days
Also, our blog is available on SIGOPS!
@ACMSIGOPS
ACM SIGOPS
2 days
Barbarians at The Gate: How AI is Upending Systems Research by @audreyccheng, @LynnLiu41887950, @melissapan, @istoica05, and the @ai4research_ucb team, https://t.co/b6vtMJN3Et This's first article of the The Next Horizon of System Intelligence blog series.
0
3
11
@xalg_ai
xAlg-ai
9 days
0
0
1
@xalg_ai
xAlg-ai
9 days
5/N vAttention when combined with a good top-k predictor can get significantly lower approximation errors and thus model quality at small token budgets.
1
0
1
@xalg_ai
xAlg-ai
9 days
4/N Even Denominator-guarantee approximation, which is cheaper to compute, gives extremely high correlation with specified tolerance and empirically observed errors
1
0
1
@xalg_ai
xAlg-ai
9 days
3/N How it works: 1️⃣ Find potentially “heavy-hitter” tokens -- including sink , local window and approximate top-k 2️⃣ Sample the rest randomly with a budget determined by CLT to ensure that (ε, δ) guarantees on attention output
1
0
1
@xalg_ai
xAlg-ai
9 days
2/N vAttention introduces verified sparse attention — you choose error tolerance (ε, δ), and it ensures approximation error stays within bounds with high probability.
1
0
1
@xalg_ai
xAlg-ai
9 days
(1/N) LLMs choke on long contexts because full attention with long sequence length applies immense memory pressure. Sparse attention has the promise, but the current approaches do not deliver. They cannot adapt to the variety of query and head specific attention patterns
1
0
1