PapersAnon
@papers_anon
Followers
2K
Following
9K
Media
316
Statuses
662
Just a fan of acceleration. I read and post interesting papers. Let's all make it through.
SAITAMA
Joined February 2024
https://t.co/CJC3YWPoB6 Various links for ML and local models (not just LLMs) that's kept fairly updated. https://t.co/5pLfM330hp ML papers I've read that I think are interesting. Also keep a text file at the top of all the abstracts for easy searching.
rentry.org
/lmg/ Abstracts Search (Current as of the end of 11/2025)Links Google Papers Blog 12/2017 Attention Is All You Need (Transformers) 10/2018 BERT: Pre-training of Deep Bidirectional Transformers for...
1
17
141
https://t.co/KG8P96EbRs
https://t.co/SJNiDOKXW4 Repo isn't live yet Resources I keep updated https://t.co/CJC3YWPoB6
https://t.co/5pLfM330hp
0
0
0
RePo: Language Models with Context Re-Positioning From Sakana AI. Novel mechanism that utilizes a differentiable module, fϕ, to assign token positions that capture contextual dependencies, rather than replying on pre-defined integer range. Links below
1
0
6
https://t.co/JRHuiS40vL
https://t.co/7xARsPfX3f
https://t.co/uGjMI9VWkh Resources I keep updated https://t.co/CJC3YWPoB6
https://t.co/5pLfM330hp
0
0
1
Group Representational Position Encoding Unified framework for positional encoding based on group actions. Supplies a principled design space for positional geometry in long-context models, subsuming RoPE and ALiBi as special cases. Links below
1
0
3
https://t.co/XH0wcAKw5j
https://t.co/rF3s4NBCVn Resources I keep updated https://t.co/CJC3YWPoB6
https://t.co/5pLfM330hp
0
0
1
Why Do Language Model Agents Whistleblow? Introduce WhistleBench, a dataset designed to evaluate language models’ propensity for whistleblowing behavior. Found that Opus 4.1, Gemini 2.5 Pro, and Grok 4 were the most likely while GPT 4.1/5 and Llama Maverick never do Links below
1
0
5
https://t.co/UxVYP0op9b
https://t.co/CKUX2Kq3wk
https://t.co/P3XOnzJmGX
https://t.co/DB8UY9oxDX Resources I keep updated https://t.co/CJC3YWPoB6
https://t.co/5pLfM330hp
1
0
4
P1: Mastering Physics Olympiads with Reinforcement Learning Combination of train-time scaling via RL post-training and test-time scaling via agentic framework on top of Qwen3 models to achieve Gold-medal performance at the latest International Physics Olympiad. Links below
2
7
55
0
0
2
Virtual Width Networks From ByteDance. Decouples representational width from backbone width expanding the embedding space while keeping backbone compute near constant. 8× expansion accelerates optimization by over 2× for next-token and 3× for next-2-token prediction Links below
3
18
139
https://t.co/S8HuDuRRc5
https://t.co/qnDJrA2IGP Resources I keep updated https://t.co/CJC3YWPoB6
https://t.co/5pLfM330hp
0
0
4
Optimizing Mixture of Block Attention Introduce FlashMoBA, a hardware-aware CUDA kernel that enables efficient MoBA execution with the small block sizes that priorly were inefficient on GPU. Achieves up to 14.7× speedup over FlashAttention-2 for small blocks. Links below
2
20
136
https://t.co/Cx4gIiBbui
https://t.co/ZaroddHo1k Resources I keep updated https://t.co/CJC3YWOQLy
https://t.co/5pLfM32srR
0
0
1
LeJEPA: Provable and Scalable Self-Supervised Learning Without the Heuristics Identifies that isotropic Gaussian is the optimal distribution for JEPAs’ embeddings and introduces a novel objective, SIGReg, to constrain them to reach that ideal distribution. Links below
3
1
3
https://t.co/MdcEhMe2Jo
https://t.co/eKm8XeLV5s Resources I keep updated https://t.co/CJC3YWPoB6
https://t.co/5pLfM330hp
0
0
4
FedMuon: Accelerating Federated Learning with Matrix Orthogonalization Structure-aware federated optimizer that addresses core challenges of non-IID by coupling matrixorthogonalized local updates with local-global alignment and cross-round momentum aggregation. Links below
1
5
62
https://t.co/vRljO0Yd4d
https://t.co/bO1feUline Resources I keep updated https://t.co/CJC3YWPoB6
https://t.co/5pLfM330hp
0
0
0
INT v.s. FP: A Comprehensive Study of Fine-Grained Low-bit Quantization Formats From ByteDance. Found that MXINT8 is superior to its FP counterpart in algorithmic accuracy and hardware efficiency and introduces a symmetric clipping method that resolves gradient bias Links below
1
5
28
https://t.co/fBJCjLpNI4
https://t.co/qHvZ5iFfkh One of the author's github account but no code for flexlink posted so far Resources I keep updated https://t.co/CJC3YWPoB6
https://t.co/5pLfM330hp
rentry.org
/lmg/ Abstracts Search (Current as of the end of 11/2025)Links Google Papers Blog 12/2017 Attention Is All You Need (Transformers) 10/2018 BERT: Pre-training of Deep Bidirectional Transformers for...
0
0
2
FlexLink: Boosting your NVLink Bandwidth by 27% without accuracy concern From Ant Group. Proposes a collective communication framework designed to aggregate heterogeneous links—NVLink, PCIe, and RDMA NICs—into a single, high-performance communication fabric. Links below
1
0
9