
search founder
@n0riskn0r3ward
Followers
2K
Following
10K
Media
276
Statuses
4K
Solo entrepreneur passionate about AI and search tech. Building a niche search product and sharing what I learn along the way.
Joined June 2022
Fun new benchmark - instruction following is a key thing to measure for enterprise use cases - but you gotta post the money shot up front!. Also want to see all the usual players: Opus 4, R1-0528, o4-mini, gemini flash etc etc with cost shown too like aider!
Introducing IFBench, a benchmark to measure how well AI models follow new, challenging, and diverse verifiable instructions. Top models like Gemini 2.5 Pro or Claude 4 Sonnet are only able to score up to 50%, presenting an open frontier for post-training. 🧵
1
0
5
When I haven't used Claude at all over the course of a week, I'm paying $20 a month, and I go issue a new query and get - sorry we can't help you bc Claude is overwhelmed at the moment. It makes @AnthropicAI feel like an unserious company.
1
0
3
RT @SinclairWang1: What Makes a Base Language Model Suitable for RL?. Rumors in the community say RL (i.e., RLVR) on LLMs is full of “myste….
0
89
0
"Specialty metal items" sounds like a better business idea than whatever perplexity is going to pivot to next lol.
Anthropic staff realized they could ask Claude to buy things that weren’t just food & drink. After someone randomly decided to ask it to order a tungsten cube, Claude ended up with an inventory full of (as it put it) “specialty metal items” that it ended up selling at a loss.
1
0
4