Explore tweets tagged as #Sparse
@zhenyishen22
Zhenyi Shen
2 hours
Sparse-attention (SA) models should resemble full attention (FA) when acting as a proxy — but in practice, SA training often produces surprisingly low sparsity in its attention maps, making them potentially suboptimal. Introducing SSA (Sparse Sparse Attention), a new
0
0
2
@seiji_________
Seiji Eicher
7 hours
Wide-EP and prefill/decode disaggregation APIs for vLLM are now available in Ray 2.52 🚀🚀 Validated at 2.4k tokens/H200 on Anyscale Runtime, these patterns maximize sparse MoE model inference efficiency, but often require non-trivial orchestration logic. Here’s how they
1
9
9
@EkdeepL
Ekdeep Singh
13 days
New paper! Language has rich, multiscale temporal structure, but sparse autoencoders assume features are *static* directions in activations. To address this, we propose Temporal Feature Analysis: a predictive coding protocol that models dynamics in LLM activations! (1/14)
7
56
269
@_Vasu_609
Vash
15 hours
Diving into DeepSeek’s Sparse Attention—smartly trimming down compute by focusing only on key tokens, while managing ultra-long contexts. Learning how efficient AI really works. #AI #DeepSeek
0
0
2
@rosinality
Rosinality
18 hours
Attention weights of sparse attention are ironically not very sparse. Thus they tried to use alignment loss from full attention. This could be an inherent difficulty of sparse training.
1
8
65
@DaysOfNoahh
DaysOfNoahh
6 days
#SPARS was predicted to start this year. Sounds like #SPARSE. Sparse meaning: little, dispersed in short supply. Was SPARS the prediction of the #ShortagePandemic? Shortage of money/credit, shortage of food, water, energy etc? Sparse can also relate to low population figures...
11
20
49
@StevenYang_desu
Steven Yang ᯅ
19 hours
Exploring a sparse-camera 4D Gaussian Splatting rig. Instead of 20–40 cameras and costly sync hardware, our prototype uses 8–10 second-hand phones with app-based sync + capture, cutting down the cost from $30K to $700. #immersiveexperience #4dgs
1
0
2
@ZhihuFrontier
Zhihu Frontier
2 days
What will the next-gen LLM architecture look like? This question keeps sparking debates — and Zhihu contributor & developer Yuxuan offers a sharp comparison between DeepSeek Sparse Attention (DSA) and Native Sparse Attention (NSA), plus a practical look at implementing DSA
0
1
38
@alfredplpl
あるふ
2 days
弊社AIdeaLabから新しい動画生成モデルを公開しました。現状では公開されていないSparse MoEによる動画生成モデルです。みなさんぜひ使ってみてください。なお、詳細はこちら↓
2
71
206
@bse_barcelona
Barcelona School of Economics
12 hours
@nathjjones @manuel_lle_an @Raquellorv @LuisigMenendez @IDEA_UAB @amil_camilo @ElliotMotte Yaping studies how aggregate conditions and idiosyncratic shocks shape the quantiles of economic outcomes, and shows that an approach using common factors and a sparse set of predictors improves quantile forecasts, even when factors are weak. https://t.co/s6P3hQ4oIv
1
2
11
@Mifucius1
Zhenxing Mi
1 day
Excited to share our new paper "One4D: Unified 4D Generation and Reconstruction via Decoupled LoRA Control" on arXiv. With our novel designs of Unified Masked Conditioning (UMC) and Decoupled LoRA Control (DLC), One4D can seamlessly handle single-image-to-4D, sparse-frame-to-4D,
0
2
8
@Dr_lingxiao
Ling XIAO
2 days
Prof. Zhu Li(University of Missouri) gave a wonderful talk to my lab, titled ‘Sparse Feature Pyramid Recovery for Vision Tasks under Adverse Imaging Conditions’.
0
0
2
@sparse2243
11 days
アイドルたちと出会ったよ #シャニソン https://t.co/BRRDJKL9cN かわいい三峰
0
0
3
@Marktechpost
Marktechpost AI Dev News ⚡
12 days
OpenAI Researchers Train Weight Sparse Transformers to Expose Interpretable Circuits OpenAI’s new work on “sparse circuits” trains GPT-2 style language models so that almost all weights are zero, which forces behaviors like quote matching and variable type tracking to run
1
7
15
@TransluceAI
Transluce
6 days
Is your LM secretly an SAE? Most circuit-finding interpretability methods use learned features rather than raw activations, based on the belief that neurons do not cleanly decompose computation. In our new work, we show MLP neurons actually do support sparse, faithful circuits!
5
66
254
@skylight_org
SkyLight
13 hours
Sparse attention offers a viable path to solving the computational and memory issues of long-context LLM serving. However, we observe a surprising gap between the rapid pace of academic research and the reality of open-source adoption. We identify the core barrier as
2
8
13
@AccBalanced
b/acc🔜 AWS re:Invent ☁️🎰
3 days
@zephyr_z9 A simple, ultra-compatible, software-based HBF is already on the market with @WekaIO Augmented Memory Grid over NVLink. It matches HBM inference performance for sparse and dense models, by reducing KV Cache GPU prefills from O(n) to O(1)
0
1
2
@AmairaaSharma
Amaira Sharma
5 days
#Homebound is a gut punch. Two friends walk home through a country that keeps failing them, and yet, Ghaywan finds beauty in their bruised hope. Sparse, haunting performances. Frames heavy with truth. A story that lingers like dust on skin. A film you feel, you carry @ghaywan
0
10
26
@askalphaxiv
alphaXiv
2 days
This Meta paper just gave an empirical proof of why RL updates in LLMs are so sparse! They proposed this "Three-Gate Theory” & showed that pretrained models have highly structured optimization landscapes, so geometry-aware RL is much better than heuristic based SFT methods
6
26
164