PapersAnon @papers_anon X Profile

PapersAnon

@papers_anon

Followers

2K

Following

8K

Media

276

Statuses

581

Just a fan of acceleration. I read and post interesting papers. Let's all make it through.

SAITAMA

Joined February 2024

Don't wanna be here? Send us removal request.

PapersAnon

@papers_anon

1 year

Various links for ML and local models (not just LLMs) that's kept fairly updated. ML papers I've read that I think are interesting. Also keep a text file at the top of all the abstracts for easy searching.

rentry.org

/lmg/ Abstracts Search (Current as end of 07/2025)Links Google Papers Blog 12/2017 Attention Is All You Need (Transformers) 10/2018 BERT: Pre-training of Deep Bidirectional Transformers for Language...

1

17

140

PapersAnon

@papers_anon

11 hours

Resources I keep updated.

0

1

PapersAnon

@papers_anon

11 hours

FastCSP: Accelerated Molecular Crystal Structure Prediction with Universal Model for Atoms. From Meta. Open-source, high-throughput CSP workflow based on machine learning interatomic potentials. Benchmarked on a curated set of 28 mostly rigid molecules consistently generating

1

3

8

PapersAnon

@papers_anon

11 hours

Resources I keep updated.

0

3

PapersAnon

@papers_anon

11 hours

MicroMix: Efficient Mixed-Precision Quantization with Microscaling Formats for Large Language Models. Mixed-precision quantization algorithm and matrix multiplication kernel based on MX data formats. Tailored for the Blackwell architecture supports arbitrary combinations of

2

24

PapersAnon

@papers_anon

2 days

Resources I keep updated.

0

4

PapersAnon

@papers_anon

2 days

Towards Higher Effective Rank in Parameter-efficient Fine-tuning using Khatri--Rao Product. Novel PEFT technique designed to overcome the representation limitations of low-rank methods. Maintains the computational advantages of LoRA in terms of training speed and memory

2

10

75

PapersAnon

@papers_anon

7 days

Only the training and modeling code. No models.Examples. Resources I keep updated.

0

1

4

PapersAnon

@papers_anon

7 days

TTS-1 Technical Report. Not an open model but training/modeling code provided. Llama 3.2 1/8B as SpeechLM backbones. Uses a novel high-resolution audio codec for 48 kHz speech synthesis. Adapts GRPO to target WER, speaker similarity, DNSMOS scores. 11 languages with fine-grained

2

9

63

PapersAnon

@papers_anon

12 days

Code might be posted on their git. Resources I keep updated.

0

1

PapersAnon

@papers_anon

12 days

Group Sequence Policy Optimization. From Qwen team. Reinforcement learning algorithm that defines the importance ratio based on sequence likelihood and performs sequence-level clipping, rewarding, and optimization. Links below

1

3

45

PapersAnon

@papers_anon

20 days

Repo isn't live yet. Resources I keep updated.

0

7

PapersAnon

@papers_anon

20 days

Mixture of Raytraced Experts. Stacked MoE architecture that can dynamically select sequences of experts, producing computational graphs of variable width and depth. Allows predictions with increasing accuracy as the computation cycles through the experts' sequence. Links below

1

27

147

PapersAnon

@papers_anon

21 days

Resources I keep updated.

0

4

PapersAnon

@papers_anon

21 days

AdaMuon: Adaptive Muon Optimizer. Augments Muon with a per-parameter second-moment modulation that captures orthogonal gradient updates to ensure update-level adaptivity and a RMS-aligned rescaling that regulates the overall update magnitude by aligning it with the intrinsic

1

3

44

PapersAnon

@papers_anon

28 days

Resources I keep updated.

0

1

PapersAnon

@papers_anon

28 days

Pre-Trained Policy Discriminators are General Reward Models. Proposes a scalable pre-training method named Policy Discriminative Learning, which trains a reward model to discern identical policies and discriminate different ones. Improved Qwen2.5-32B from 64.49% to 70.47% on 20

1

0

19

PapersAnon

@papers_anon

1 month

Resources I keep updated.

0

3

PapersAnon

@papers_anon

1 month

GLM-4.1V-Thinking: Towards Versatile Multimodal Reasoning with Scalable Reinforcement Learning. 9B VLM model that introduces Reinforcement Learning with Curriculum Sampling and dynamic sampling expansion via ratio-based EMA to drive large-scale, cross-domain reasoning

3

5

46

PapersAnon

@papers_anon

1 month

Resources I keep updated.

0

2

PapersAnon

@papers_anon

1 month

GPAS: Accelerating Convergence of LLM Pretraining via Gradient-Preserving Activation Scaling. Scales down the intermediate activations while keeping their gradients unchanged. This leaves information in the activations intact and avoids the gradient vanishing problem associated

1

6

36