
Piotr Nawrot
@p_nawrot
Followers
8K
Following
951
Media
32
Statuses
391
LLM Efficiency PhD @ Edinburgh | 🥇🥈 @ Flunkyball Polish Championships | 🥇 @ Jerry Hunter Pub's Bowling Tournament | 50000 🏆 & Legendary II @ Brawl Stars
Warsaw
Joined July 2014
Sparse attention is one of the most promising strategies to unlock long-context processing and long generation reasoning in LLMs. We performed the most comprehensive study on training-free sparse attention to date. Here is what we found:
11
112
649
RT @s_scardapane: *The Sparse Frontier: Sparse Attention Trade-offs in Transformer LLMs*.by @p_nawrot @PontiEdoardo @cheeesio @seb_ruder. T….
0
28
0
RT @PontiEdoardo: Thanks for acknowledging Dynamic Token Pooling as a predecessor to H-Net, @_albertgu!. We had some decent ideas in that p….
0
10
0
RT @PontiEdoardo: If you are at @icmlconf make sure to attend @AdrianLancucki’s invited talk on our inference-time *hyper*-scaling paper (a….
0
2
0
RT @tokshop2025: The TokShop schedule is now live! Join us at #ICML2025 for invited talks, poster sessions, and a panel on the future of to….
0
3
0
RT @sukjun_hwang: Tokenization has been the final barrier to truly end-to-end language models. We developed the H-Net: a hierarchical netw….
0
741
0
RT @_albertgu: Tokenization is just a special case of "chunking" - building low-level data into high-level abstractions - which is in turn….
0
190
0
RT @ori_press: Do language models have algorithmic creativity?. To find out, we built AlgoTune, a benchmark challenging agents to optimize….
0
60
0
RT @vllm_project: glad to see how researchers explore the flexibility of vLLM while still enjoying the performance benefit😁.
0
4
0
RT @Marktechpost: NVIDIA Researchers Introduce Dynamic Memory Sparsification (DMS) for 8× KV Cache Compression in Transformer LLMs. As the….
0
13
0
RT @PontiEdoardo: Last week marked the end of my stay as a visiting professor at @nvidia. During my time there, I became passionate about….
0
4
0