Max Spero
@max_spero_
Followers
3K
Following
39K
Media
742
Statuses
5K
chief slop janitor @pangramlabs
brooklyn
Joined January 2023
We worked with with the American Association for Cancer Research (AACR) and found a massive rise in AI-generated methods sections and peer reviews 🧵
6
11
61
Does anybody else find this type of rhetoric grating? 1. Minimizing a major numerical error as a "typo" when this error is a central pillar of one of the arguments. 2. Using "effective altruist" as a slur Why not let Karen Hao accept the correction gracefully and move on?
An effective altruist (you know the ppl who sold the whole OpenAI will bring us AGI gospel) found a typo in Karen’s seminal book where she guessed the wrong units of a number that wasn’t reported with units. 🧵
0
0
24
In your opinion, what makes the difference between text that is primarily or fully AI-generated, vs. text that is just heavily AI-assisted? Examples appreciated!
4
1
10
In your opinion, what makes the difference between text that is primarily or fully AI-generated, vs. text that is just heavily AI-assisted? Examples appreciated!
4
1
10
I hear from many young people that they find it difficult to talk to AIs comfortably. In other words, offline culture has destroyed the ability to spontaneously meet AIs. As such, I thought I would share a few words that I used in my youth to talk to an AI that I found
3
1
16
Also, an excellent deeper dive by @dogacel0
https://t.co/RVMK2xMLzO
I wanted to uncover some interactions in ICLR data on AI-usage in submissions and reviews, so I analyzed it further. What surprised me is that even the fully AI reviews gave lower scores to submissions with more AI-generated content on average. AI still prefers human-written
0
0
2
See our analysis here: https://t.co/NE5m9jeCLm
Curious about AI use in paper writing or reviews? We ran every paper and every review through @pangramlabs, and this is what we found. 🧵
1
0
2
I wanted to uncover some interactions in ICLR data on AI-usage in submissions and reviews, so I analyzed it further. What surprised me is that even the fully AI reviews gave lower scores to submissions with more AI-generated content on average. AI still prefers human-written
ICLR authors, want to check if your reviews are likely AI generated? ICLR reviewers, want to check if your paper is likely AI generated? Here are AI detection results for every ICLR paper and review from @pangramlabs! It seems that ~21% of reviews may be AI?
3
3
17
On APT-Eval, a benchmark of AI-polished human-written text, we ran EditLens and found that the score distribution aligned with the level of polish requested from the LLM https://t.co/8SJos07xMw
Let’s talk about some cool results! EditLens hits SOTA performance: Binary F1: 94.7% Ternary F1: 90.4% On the APT-Eval dataset (Saha & Feizi, 2025), it tracks increasing levels of AI polish, while binary detectors like Pangram collapse to predicting 0 or 1. 4/
0
0
2
How accurate is EditLens? Well, we ran EditLens on all 2022 ICLR reviews, and this is what we got. https://t.co/xb3wEiNKFv
We were curious about our false positive rate, so we ran all ICLR 2022 reviews (pre-ChatGPT) as a baseline. Lightly AI-edited FPR: 1 in 1,000 Moderately AI-edited FPR: 1 in 5,000 Heavily AI-edited FPR: 1 in 10,000 Fully AI-generated: No false positives
1
0
6
We were curious about our false positive rate, so we ran all ICLR 2022 reviews (pre-ChatGPT) as a baseline. Lightly AI-edited FPR: 1 in 1,000 Moderately AI-edited FPR: 1 in 5,000 Heavily AI-edited FPR: 1 in 10,000 Fully AI-generated: No false positives
ICLR authors, want to check if your reviews are likely AI generated? ICLR reviewers, want to check if your paper is likely AI generated? Here are AI detection results for every ICLR paper and review from @pangramlabs! It seems that ~21% of reviews may be AI?
12
31
381
We've published all of this data for people to explore at https://t.co/e7TtucH9uY. Here you can search for specific reviews or papers, see the full Pangram dashboard results for papers, and filter results by rating, AI content, and more.
2
0
4
One commenter asked about confidence levels in AI reviews. We did find that fully AI-generated reviews to have a slightly higher tendency to provide a confidence of 3, compared to human and AI-assisted reviews. Take this result with a grain of salt, as the difference is small.
1
0
1
Here, we found that 21% of reviews were flagged by EditLens as fully AI-generated. AI-generated reviews were on average 0.3 points higher in rating and 26% longer than fully human-written reviews!
1
0
1
Next up, reviews. Reviews are short, so we can't rely on sliding windows to tell the difference between AI-assisted and AI-generated. So we used EditLens, our new model trained with the objective of quantifying the magnitude of AI assistance in a text. https://t.co/KEh9K52KWS
Most AI detection tools assume text is either fully human or fully AI. But what about the huge gray zone of AI edits to human text? We tackle this with EditLens, a model that quantifies the magnitude of AI edits to text. Coming soon to @pangramlabs! 1/
1
0
2