Stella Biderman
@BlancheMinerva
Followers
17K
Following
12K
Media
647
Statuses
13K
Ensuring that tech companies don't have a monopoly on being able to do research on cutting edge AI @AiEleuther. She/her
Joined May 2019
The actual American thing to do is not report people to a government that regularly breaks the law, ignores people's rights, and sends them to torture and death camps. You have a moral obligation to hide people from ICE just like you did to hide them from the Gestapo.
6
6
93
Some developers say AI is now a massive productivity booster. Are they right? @METR_Evals is running another study to measure this. HMU if you want to participate
17
14
131
🎙️New Episode: November 6, 2025 -Unpacking election night -Separating noise from substance on election takeaways -Zohran's big government warning -Shutdown day 37 - cracks form with suspicious timing -Trump's tariffs at SCOTUS -AP and climate nerds take on PETS -Viewer Mail!
11
10
39
I’m so excited that Global PIQA is out! This has been a herculean effort by our 300+ contributors. The result is an extremely high-quality, culturally-specific benchmark for over 100 languages.
Introducing Global PIQA, a new multilingual benchmark for 100+ languages. This benchmark is the outcome of this year’s MRL shared task, in collaboration with 300+ researchers from 65 countries. This dataset evaluates physical commonsense reasoning in culturally relevant contexts.
1
7
33
Michaël is far from the only person I've seen defend the conversion, but gets huge applause for being one of the only ones to publicly admit to being fooled.
Six years ago I defended OpenAI LP when everyone called it a bait-and-switch. After November 2023 and the recapitalization, I feel like a fool.
2
1
69
Our #NeurIPS2025 paper shows that even comparable monolingual tokenizers have different compression rates across languages. But by getting rid of whitespace tokenization and using a custom vocab size for each language, we can reduce token premiums. Preprint out now!
4
9
41
We've been cooking something up in the studio. The @nvidiastudio that is, letting us create without limits.
15
4
115
@AISecurityInst I think the dynamics within a model regarding robust knowledge (DeepIg) and information about specific details relevant to privacy concerns (Hubble) are plausibly different. The fact that both have an 8B model trained for about 500B tokens on DCLM is extra fortuitous!
0
0
1
@AISecurityInst One thing that's been on my mind during this work is that different application contexts for MU probably require different test-beds. I would not be surprised if something worked on one suite and not the other, since
1
0
1
Using these models to investigate unlearning of PII and similar data seems like a really good idea. I strongly recommend pairing this with the DeepIgnorance models we recently released w/ @AISecurityInst, which have similar pairs in another context
Are you afraid of LLMs teaching people how to build bioweapons? Have you tried just... not teaching LLMs about bioweapons? @AIEleuther and @AISecurityInst joined forces to see what would happen, pretraining three 6.9B models for 500B tokens and producing 15 total models to study
1
0
2
I also really appreciate the positioning to use this to study machine unlearning. Personally, I don't really believe any LLM unlearning papers because nobody checks whether the unlearned model behaves like the model that never saw the data, and the evidene we do have is negative.
1
0
3
On November 6, 2016 (9 years ago) I stood in front of the Hilton in NYC and said "Donald Trump will become the 45th President of the United States. It will be a late night, but I am calling it a victory". A lot has changed over the past 9 years but as I wrote in my book (The
15
9
67
Really great work! If you're interested in doing research on memorization in LLMs, this is a great suite to use.
Announcing 🔭✨Hubble, a suite of open-source LLMs to advance the study of memorization! Pretrained models up to 8B params, with controlled insertion of texts (e.g., book passages, biographies, test sets, and more!) designed to emulate key memorization risks 🧵
2
3
26
We are currently looking into the question about if Pythia 2.8B deduped was actually trained on the standard Pile. We'll let everyone know what we find!
0
0
2
This line of research from Sally and her collaborators is incredible and something I recommend every chance I get.
🔎Did someone steal your language model? We can tell you, as long as you shuffled your training data🔀. All we need is some text from their model! Concretely, suppose Alice trains an open-weight model and Bob uses it to produce text. Can Alice prove Bob used her model?🚨
1
2
37
We are announcing an opportunity for paid question writers to contribute to a new PhD-level math benchmark. Accepted contributors will be paid per question and will be invited to be authors on the resulting dataset paper. Check out the link below for more information!
1
5
23
Just in case anyone is wondering, it's not just SNAP that isn't being paid during the shutdown. I just got a $645 Bill from O&R for gas and electricity. The last bill I paid, with HEAP, was $94.00 I was told, by them, that I am responsible for the entire amount, as they
1
2
7
I previously tweeted about the fact that I spotted two very similar papers and that it seemed sus. I didn't look very carefully at the papers in question and failed to notice that one paper cited the other. In reality it was two consecutive papers by the same authors. I'm sorry.
@BlancheMinerva Now that both papers are public on arXiv, as the first author of both papers, we want to clarify the relationship between the two submissions. The earlier paper, PonderLM, is a resubmission of NeurIPS that was initially posted on arXiv in May. Following its original submission,
5
1
142
It hasn't occurred to me that the answer might be two submissions from the same team. Note that I hadn't Googled them because I don't Google papers until I have my batch of papers to review. I'm pro-preprint but don't seek out learning about papers I might review.
2
0
41
My favorite part of ICLR submissions going public is playing "plagiarized, strange coincidence, or AI-generated"
14
21
322
The city of San Francisco spends hundreds of millions of dollars per year on homelessness, but the problem continues spiraling out of control. @beyondhomeless1
7
3
18
COLM is inevitable, really. because LLMs are not really about language/nlp and are certainly not about loss-based ML. they are something new. and they deserve their own thing. i'm glad dipanjan et al realized this and pushed forward.
1
9
135
@CommonCrawl @MLCommons As a final transparency note we had two papers rejected from CoLM, "Evaluating SAE interpretability without explanations" and "Sparse Autoencoders Trained on the Same Data Learn Different Features." Both papers have preprints on arXiv, and we hope to present them at another
0
1
11
And w/ @CommonCrawl and @MLCommons we are hosting the 1st Workshop on Multilingual Data Quality Signals. The core purpose of this workshop is to improve our ability to automatically identify high quality data in diverse languages. Join us on Oct 10th! https://t.co/EzxJ583NBQ
wmdqs.org
A workshop addressing multilingual data quality. Held on the 10th October 2025 in Montréal.
1
3
3