Explore tweets tagged as #Sparse
Sparse-attention (SA) models should resemble full attention (FA) when acting as a proxy — but in practice, SA training often produces surprisingly low sparsity in its attention maps, making them potentially suboptimal. Introducing SSA (Sparse Sparse Attention), a new
0
0
2
Wide-EP and prefill/decode disaggregation APIs for vLLM are now available in Ray 2.52 🚀🚀 Validated at 2.4k tokens/H200 on Anyscale Runtime, these patterns maximize sparse MoE model inference efficiency, but often require non-trivial orchestration logic. Here’s how they
1
9
9
New paper! Language has rich, multiscale temporal structure, but sparse autoencoders assume features are *static* directions in activations. To address this, we propose Temporal Feature Analysis: a predictive coding protocol that models dynamics in LLM activations! (1/14)
7
56
269
Attention weights of sparse attention are ironically not very sparse. Thus they tried to use alignment loss from full attention. This could be an inherent difficulty of sparse training.
1
8
65
#SPARS was predicted to start this year. Sounds like #SPARSE. Sparse meaning: little, dispersed in short supply. Was SPARS the prediction of the #ShortagePandemic? Shortage of money/credit, shortage of food, water, energy etc? Sparse can also relate to low population figures...
11
20
49
Exploring a sparse-camera 4D Gaussian Splatting rig. Instead of 20–40 cameras and costly sync hardware, our prototype uses 8–10 second-hand phones with app-based sync + capture, cutting down the cost from $30K to $700. #immersiveexperience #4dgs
1
0
2
What will the next-gen LLM architecture look like? This question keeps sparking debates — and Zhihu contributor & developer Yuxuan offers a sharp comparison between DeepSeek Sparse Attention (DSA) and Native Sparse Attention (NSA), plus a practical look at implementing DSA
0
1
38
弊社AIdeaLabから新しい動画生成モデルを公開しました。現状では公開されていないSparse MoEによる動画生成モデルです。みなさんぜひ使ってみてください。なお、詳細はこちら↓
2
71
206
@nathjjones @manuel_lle_an @Raquellorv @LuisigMenendez @IDEA_UAB @amil_camilo @ElliotMotte Yaping studies how aggregate conditions and idiosyncratic shocks shape the quantiles of economic outcomes, and shows that an approach using common factors and a sparse set of predictors improves quantile forecasts, even when factors are weak. https://t.co/s6P3hQ4oIv
1
2
11
Excited to share our new paper "One4D: Unified 4D Generation and Reconstruction via Decoupled LoRA Control" on arXiv. With our novel designs of Unified Masked Conditioning (UMC) and Decoupled LoRA Control (DLC), One4D can seamlessly handle single-image-to-4D, sparse-frame-to-4D,
0
2
8
Prof. Zhu Li(University of Missouri) gave a wonderful talk to my lab, titled ‘Sparse Feature Pyramid Recovery for Vision Tasks under Adverse Imaging Conditions’.
0
0
2
OpenAI Researchers Train Weight Sparse Transformers to Expose Interpretable Circuits OpenAI’s new work on “sparse circuits” trains GPT-2 style language models so that almost all weights are zero, which forces behaviors like quote matching and variable type tracking to run
1
7
15
Is your LM secretly an SAE? Most circuit-finding interpretability methods use learned features rather than raw activations, based on the belief that neurons do not cleanly decompose computation. In our new work, we show MLP neurons actually do support sparse, faithful circuits!
5
66
254
Sparse attention offers a viable path to solving the computational and memory issues of long-context LLM serving. However, we observe a surprising gap between the rapid pace of academic research and the reality of open-source adoption. We identify the core barrier as
2
8
13
@zephyr_z9 A simple, ultra-compatible, software-based HBF is already on the market with @WekaIO Augmented Memory Grid over NVLink. It matches HBM inference performance for sparse and dense models, by reducing KV Cache GPU prefills from O(n) to O(1)
0
1
2
#Homebound is a gut punch. Two friends walk home through a country that keeps failing them, and yet, Ghaywan finds beauty in their bruised hope. Sparse, haunting performances. Frames heavy with truth. A story that lingers like dust on skin. A film you feel, you carry @ghaywan
0
10
26
This Meta paper just gave an empirical proof of why RL updates in LLMs are so sparse! They proposed this "Three-Gate Theory” & showed that pretrained models have highly structured optimization landscapes, so geometry-aware RL is much better than heuristic based SFT methods
6
26
164