papers_anon Profile Banner
PapersAnon Profile
PapersAnon

@papers_anon

Followers
2K
Following
8K
Media
276
Statuses
581

Just a fan of acceleration. I read and post interesting papers. Let's all make it through.

SAITAMA
Joined February 2024
Don't wanna be here? Send us removal request.
@papers_anon
PapersAnon
1 year
Various links for ML and local models (not just LLMs) that's kept fairly updated. ML papers I've read that I think are interesting. Also keep a text file at the top of all the abstracts for easy searching.
Tweet card summary image
rentry.org
/lmg/ Abstracts Search (Current as end of 07/2025)Links Google Papers Blog 12/2017 Attention Is All You Need (Transformers) 10/2018 BERT: Pre-training of Deep Bidirectional Transformers for Language...
1
17
140
@papers_anon
PapersAnon
11 hours
Resources I keep updated.
0
0
1
@papers_anon
PapersAnon
11 hours
FastCSP: Accelerated Molecular Crystal Structure Prediction with Universal Model for Atoms. From Meta. Open-source, high-throughput CSP workflow based on machine learning interatomic potentials. Benchmarked on a curated set of 28 mostly rigid molecules consistently generating
Tweet media one
1
3
8
@papers_anon
PapersAnon
11 hours
Resources I keep updated.
0
0
3
@papers_anon
PapersAnon
11 hours
MicroMix: Efficient Mixed-Precision Quantization with Microscaling Formats for Large Language Models. Mixed-precision quantization algorithm and matrix multiplication kernel based on MX data formats. Tailored for the Blackwell architecture supports arbitrary combinations of
Tweet media one
2
2
24
@papers_anon
PapersAnon
2 days
Resources I keep updated.
0
0
4
@papers_anon
PapersAnon
2 days
Towards Higher Effective Rank in Parameter-efficient Fine-tuning using Khatri--Rao Product. Novel PEFT technique designed to overcome the representation limitations of low-rank methods. Maintains the computational advantages of LoRA in terms of training speed and memory
Tweet media one
2
10
75
@papers_anon
PapersAnon
7 days
Only the training and modeling code. No models.Examples. Resources I keep updated.
0
1
4
@papers_anon
PapersAnon
7 days
TTS-1 Technical Report. Not an open model but training/modeling code provided. Llama 3.2 1/8B as SpeechLM backbones. Uses a novel high-resolution audio codec for 48 kHz speech synthesis. Adapts GRPO to target WER, speaker similarity, DNSMOS scores. 11 languages with fine-grained
Tweet media one
2
9
63
@papers_anon
PapersAnon
12 days
Code might be posted on their git. Resources I keep updated.
0
0
1
@papers_anon
PapersAnon
12 days
Group Sequence Policy Optimization. From Qwen team. Reinforcement learning algorithm that defines the importance ratio based on sequence likelihood and performs sequence-level clipping, rewarding, and optimization. Links below
Tweet media one
1
3
45
@papers_anon
PapersAnon
20 days
Repo isn't live yet. Resources I keep updated.
0
0
7
@papers_anon
PapersAnon
20 days
Mixture of Raytraced Experts. Stacked MoE architecture that can dynamically select sequences of experts, producing computational graphs of variable width and depth. Allows predictions with increasing accuracy as the computation cycles through the experts' sequence. Links below
Tweet media one
1
27
147
@papers_anon
PapersAnon
21 days
Resources I keep updated.
0
0
4
@papers_anon
PapersAnon
21 days
AdaMuon: Adaptive Muon Optimizer. Augments Muon with a per-parameter second-moment modulation that captures orthogonal gradient updates to ensure update-level adaptivity and a RMS-aligned rescaling that regulates the overall update magnitude by aligning it with the intrinsic
Tweet media one
1
3
44
@papers_anon
PapersAnon
28 days
Resources I keep updated.
0
0
1
@papers_anon
PapersAnon
28 days
Pre-Trained Policy Discriminators are General Reward Models. Proposes a scalable pre-training method named Policy Discriminative Learning, which trains a reward model to discern identical policies and discriminate different ones. Improved Qwen2.5-32B from 64.49% to 70.47% on 20
Tweet media one
1
0
19
@papers_anon
PapersAnon
1 month
Resources I keep updated.
0
0
3
@papers_anon
PapersAnon
1 month
GLM-4.1V-Thinking: Towards Versatile Multimodal Reasoning with Scalable Reinforcement Learning. 9B VLM model that introduces Reinforcement Learning with Curriculum Sampling and dynamic sampling expansion via ratio-based EMA to drive large-scale, cross-domain reasoning
Tweet media one
3
5
46
@papers_anon
PapersAnon
1 month
Resources I keep updated.
0
0
2
@papers_anon
PapersAnon
1 month
GPAS: Accelerating Convergence of LLM Pretraining via Gradient-Preserving Activation Scaling. Scales down the intermediate activations while keeping their gradients unchanged. This leaves information in the activations intact and avoids the gradient vanishing problem associated
Tweet media one
1
6
36