EvMill Profile Banner
Evan Miller Profile
Evan Miller

@EvMill

Followers
5K
Following
269
Media
63
Statuses
1K

Statistically inclined software developer, occasional blogger about math + stats stuff. Working on evals @AnthropicAI

NYC
Joined May 2009
Don't wanna be here? Send us removal request.
@EvMill
Evan Miller
2 years
I hit a bug in the Attention formula that’s been overlooked for 8+ years. All Transformer models (GPT, LLaMA, etc) are affected. Researchers isolated the bug last month – but they missed a simple solution…. Why LLM designers should stop using Softmax 👇.
evanmiller.org
Let’s fix these pesky Transformer outliers using Softmax One and QuietAttention.
76
357
2K
@EvMill
Evan Miller
7 months
RT @klaviyo: 🚀 New on the Klaviyo Data Science Podcast: @EvMill joins us to discuss his paper, Adding Error Bars to Evals: A Statistical Ap….
0
4
0
@grok
Grok
6 days
Join millions who have switched to Grok.
227
452
4K
@EvMill
Evan Miller
9 months
RT @AnthropicAI: We’re starting a Fellows program to help engineers and researchers transition into doing frontier AI safety research full-….
0
301
0
@EvMill
Evan Miller
10 months
RT @JeremyDanielFox: Awesome new research by my friend and colleague @EvMill — adding error bars to evals! Always great to see the Central….
0
2
0
@EvMill
Evan Miller
10 months
RT @emollick: I cannot agree with this more. Please use basic research methods on AI benchmarking!
Tweet media one
Tweet media two
Tweet media three
0
30
0
@EvMill
Evan Miller
10 months
RT @AnthropicAI: New Anthropic research: Adding Error Bars to Evals. AI model evaluations don’t usually include statistics or uncertainty.….
Tweet card summary image
anthropic.com
Anthropic is an AI safety and research company that's working to build reliable, interpretable, and steerable AI systems.
0
308
0
@EvMill
Evan Miller
11 months
RT @DarioAmodei: Machines of Loving Grace: my essay on how AI could transform the world for the better.
Tweet card summary image
darioamodei.com
How AI Could Transform the World for the Better
0
1K
0
@EvMill
Evan Miller
1 year
I think I've finally cracked quantiles…. A/B testing medians, instead of means, usually requires an expensive bootstrap. But we can use a likelihood-ratio test (Wilks' theorem) instead. This reduces the quantile problem to a few simple formulas. Read on!
Tweet card summary image
arxiv.org
Quantiles can represent key operational and business metrics, but the computational challenges associated with inference has hampered their adoption in online experimentation. One-sample...
0
1
18
@EvMill
Evan Miller
2 years
RT @natfriedman: Ten months ago, we launched the Vesuvius Challenge to solve the ancient problem of the Herculaneum Papyri, a library of sc….
0
15K
0
@EvMill
Evan Miller
2 years
RT @BeidiChen: @ggerganov @EvMill The blog about Softmax+1 plays a very important role when we were trying to identify the root cause of th….
0
1
0
@EvMill
Evan Miller
2 years
👀.
@mnagel87
Markus Nagel
2 years
@Tracing47202686 @yell1337 @TiRune Unlike with clipped softmax, to achieve an exact zero in the output using softmax1 for a (partial) no-update, the input requires to be -infinity. However, after @EvMill blog post we experimented with softmax1 and found it in practice competitive with our proposed approaches.
1
0
13
@EvMill
Evan Miller
2 years
RT @Astatide42: Results of my latest nerdsnipe from @TetraspaceWest!. The plot below shows the predicted shape of the water flow, with a mo….
0
4
0
@EvMill
Evan Miller
2 years
RT @capetorch: Following @EvMill great blog post on encountered issues on the GPT-like models training that appear to be related to the Sof….
0
10
0
@EvMill
Evan Miller
2 years
Softmax1 update…. We now have support for. ⚡️Flash Attention ⚡️. This lets us test much larger models than before! To get the code, just. pip install flash-attention-softmax-n. Or clone / star the GitHub repo here: All credit / kudos to Chris Murphy.
Tweet card summary image
github.com
CUDA and Triton implementations of Flash Attention with SoftmaxN. - softmax1/Flash-Attention-Softmax-N
2
12
52
@EvMill
Evan Miller
2 years
Softmax1, Week 2. Second set of empirical results are in, and they are…. 🌸 promising 🌸. Weight kurtosis is roughly the same – but activation kurtosis improved 30X (!!) and maximum activation magnitude reduced 15X (!). Read more from @johnowhitaker:.
Tweet card summary image
datasciencecastnet.home.blog
Last week a guy called Evan Miller tweeted out a blog post claiming to have discovered a flaw in the attention mechanism used by transformers today: The phrasing was sensationalist, and many people…
1
17
106
@EvMill
Evan Miller
2 years
RT @FarisSbahi: Controlling language models has a long way to go - and clever techniques - involving Finite State Machines - offer a way to….
0
21
0