Evan Miller @EvMill X Profile

Evan Miller

@EvMill

Followers

5K

Following

269

Media

63

Statuses

1K

Statistically inclined software developer, occasional blogger about math + stats stuff. Working on evals @AnthropicAI

NYC

Joined May 2009

Don't wanna be here? Send us removal request.

Evan Miller

@EvMill

2 years

I hit a bug in the Attention formula that’s been overlooked for 8+ years. All Transformer models (GPT, LLaMA, etc) are affected. Researchers isolated the bug last month – but they missed a simple solution…. Why LLM designers should stop using Softmax 👇.

evanmiller.org

Let’s fix these pesky Transformer outliers using Softmax One and QuietAttention.

76

357

2K

Evan Miller

@EvMill

7 months

RT @klaviyo: 🚀 New on the Klaviyo Data Science Podcast: @EvMill joins us to discuss his paper, Adding Error Bars to Evals: A Statistical Ap….

0

4

0

Grok

@grok

6 days

Join millions who have switched to Grok.

227

452

4K

Evan Miller

@EvMill

9 months

RT @AnthropicAI: We’re starting a Fellows program to help engineers and researchers transition into doing frontier AI safety research full-….

0

301

0

Evan Miller

@EvMill

10 months

RT @MariusHobbhahn: This paper on the statistics of evals is great (and seems to be flying under the radar): The a….

arxiv.org

Evaluations are critical for understanding the capabilities of large language models (LLMs). Fundamentally, evaluations are experiments; but the literature on evaluations has largely ignored the...

0

37

0

Evan Miller

@EvMill

10 months

RT @JeremyDanielFox: Awesome new research by my friend and colleague @EvMill — adding error bars to evals! Always great to see the Central….

0

2

0

Evan Miller

@EvMill

10 months

RT @emollick: I cannot agree with this more. Please use basic research methods on AI benchmarking!

0

30

0

Evan Miller

@EvMill

10 months

RT @AnthropicAI: New Anthropic research: Adding Error Bars to Evals. AI model evaluations don’t usually include statistics or uncertainty.….

anthropic.com

Anthropic is an AI safety and research company that's working to build reliable, interpretable, and steerable AI systems.

0

308

0

Evan Miller

@EvMill

11 months

RT @DarioAmodei: Machines of Loving Grace: my essay on how AI could transform the world for the better.

darioamodei.com

How AI Could Transform the World for the Better

0

1K

0

Evan Miller

@EvMill

1 year

New sequential A/B test from @Zalando based on the Lévy inequality – check it out!

arxiv.org

Large-scale randomised experiments have become a standard tool for developing products and improving user experience. To reduce losses from shipping harmful changes experimental results are, in...

0

2

7

Evan Miller

@EvMill

1 year

I think I've finally cracked quantiles…. A/B testing medians, instead of means, usually requires an expensive bootstrap. But we can use a likelihood-ratio test (Wilks' theorem) instead. This reduces the quantile problem to a few simple formulas. Read on!

arxiv.org

Quantiles can represent key operational and business metrics, but the computational challenges associated with inference has hampered their adoption in online experimentation. One-sample...

0

1

18

Evan Miller

@EvMill

2 years

RT @natfriedman: Ten months ago, we launched the Vesuvius Challenge to solve the ancient problem of the Herculaneum Papyri, a library of sc….

0

15K

0

Evan Miller

@EvMill

2 years

RT @BeidiChen: @ggerganov @EvMill The blog about Softmax+1 plays a very important role when we were trying to identify the root cause of th….

0

1

0

Evan Miller

@EvMill

2 years

RT @ggerganov: Have a few thoughts about this approach. But most importantly, I'm happy to see @EvMill's idea on softmax1 recognized - to m….

arxiv.org

Deploying Large Language Models (LLMs) in streaming applications such as multi-round dialogue, where long interactions are expected, is urgently needed but poses two major challenges. Firstly,...

0

16

0

Evan Miller

@EvMill

2 years

👀.

Markus Nagel

@mnagel87

2 years

@Tracing47202686 @yell1337 @TiRune Unlike with clipped softmax, to achieve an exact zero in the output using softmax1 for a (partial) no-update, the input requires to be -infinity. However, after @EvMill blog post we experimented with softmax1 and found it in practice competitive with our proposed approaches.

1

0

13

Evan Miller

@EvMill

2 years

RT @Astatide42: Results of my latest nerdsnipe from @TetraspaceWest!. The plot below shows the predicted shape of the water flow, with a mo….

0

4

0

Evan Miller

@EvMill

2 years

RT @capetorch: Following @EvMill great blog post on encountered issues on the GPT-like models training that appear to be related to the Sof….

0

10

0

Evan Miller

@EvMill

2 years

RT @PPapakonNucl: Kurt Vonnegut's 1969 address to the American Physical Society @APSphysics --on the innocence of the "old-fashioned scient….

docs.google.com

ADDRESS TO THE AMERICAN PHYSICAL SOCIETY New York City, 1969 MY ONLY BROTHER is a cloud physicist. He is nine years older than I am, and was an inspiration to me in my youth. He used to work with the...

0

18

0

Evan Miller

@EvMill

2 years

Softmax1 update…. We now have support for. ⚡️Flash Attention ⚡️. This lets us test much larger models than before! To get the code, just. pip install flash-attention-softmax-n. Or clone / star the GitHub repo here: All credit / kudos to Chris Murphy.

github.com

CUDA and Triton implementations of Flash Attention with SoftmaxN. - softmax1/Flash-Attention-Softmax-N

2

12

52

Evan Miller

@EvMill

2 years

Softmax1, Week 2. Second set of empirical results are in, and they are…. 🌸 promising 🌸. Weight kurtosis is roughly the same – but activation kurtosis improved 30X (!!) and maximum activation magnitude reduced 15X (!). Read more from @johnowhitaker:.