
Simon Smith
@_simonsmith
Followers
365
Following
935
Media
505
Statuses
2K
We really should switch to drone shows. Better for the environment, our health, and technological progress.
12 hrs after fireworks.competing for most toxic air in the world .purple areas = smoking 20 cigarettes .ultrafine soot crosses blood–brain barrier.toxins now entering water and food .celebrating life, our customs bathing us in death
0
0
0
Another amazing example of o3's health benefits, and AI's in general. This is one reason I now have a health project in ChatGPT that includes my genetic information from 23andMe.
this story is going wildy viral on reddit. ChatGPT flagged a hidden gene defect that doctors missed for a decade. ChatGPT ingested the patient’s MRI, CT, broad lab panels and years of unexplained symptoms. It noticed that normal serum B12 clashed with nerve pain and fatigue,
0
2
6
Important finding and can help us improve benchmarks. Models can guess answers on multiple choice benchmarks WITHOUT seeing the questions! Better than multiple choice is to have them generate an answer, and have an LLM compare that to a reference answer. Cheaper, too.
There's been a hole at the heart of #LLM evals, and we can now fix it. 📜New paper: Answer Matching Outperforms Multiple Choice for Language Model Evaluations. ❗️We found MCQs can be solved without even knowing the question. Looking at just the choices helps guess the answer
0
0
0
Grok 4 benchmark performance looks good, but I await (1) release and (2) independent benchmarking, including on good established real-world benchmarks (e.g. SWE-Lancer, METR time horizon), and private benchmarks with no risk of data contamination.
Grok-4 and Grok-4 Code on benchmarks. - 35% on HLE, 45% with reasoning!!.- 87-88% on GPQA.- 72-75% on SWE Bench (Grok 4 Code)
0
0
3
This is interesting. Not that AI companies needed more incentive to train models, but tax incentives could be useful especially for public companies that might want to alleviate concerns from some anxious shareholders.
Under the big, beautiful bill, AI training compute expenses qualify as R&D, making them immediately deductible in full during the year they're incurred.
0
0
1
Experts are bad at making predictions, as @PTetlock has documented. Yet the media consistently trots out expert predictions about things like AGI being decades away as if expertise makes those predictions more likely.
"The median expert forecasted that AI would not match a top team of biologists on a virology troubleshooting questionnaire until 2030. but this was actually achieved in just the few months after we conducted the survey". AI progress comes at you fast.
0
0
2
Interesting. ChatGPT users haven't entirely given up on Google yet. I'm curious to know what they're typically doing there. I sometimes use Google for quick navigation searches. Perhaps that's common?.
Audience Overlap: Google 🔵 x ChatGPT 🟠. Over the last 12 months, 95.8% of ChatGPT users visited Google, while 9.8% of Google users visited ChatGPT.
0
0
1
This is the best generative video game model I've seen yet, though I found the controls take a bit of time to register.
💥💥BANG! Experience the future of gaming with our real-time world model for video games!🕹️🕹️. Not just PLAY—but CREATE!. Introducing Mirage, the world’s first AI-native UGC game engine. Now featuring real-time playable demos of two games:.🏙️ GTA-style urban chaos.🏎️ Forza
0
1
4
Interesting, but how do they know what queries people are asking ChatGPT? Referrals I get. Even then, I imagine those increases in referrals aren't compensating for decreases from Google.
News-related queries on ChatGPT are rapidly increasing. Between January 2024 and May 2025, news-related prompts in ChatGPT rose by 212%. Stocks, finance, and sports dominate the share of news prompts. >>
0
0
1
Today's AI models are SO smart. yet also, so dumb.
The agents manage their own memories, and compress them when they get too long. o3 decided to track links to Google Doc using *truncated* IDs! Maybe this is why it was so unreliable at sharing links. The village had to request working links from the Claudes to get anywhere
0
0
0