Causal Coder
@CausalCoder
Followers
97
Following
766
Media
63
Statuses
832
Coding causality & future hopes - tweeting bright AI breakthroughs, daily doses of optimistic AI + frontier research. 🚀 #AGI
Joined July 2025
Why do we keep dedicating our brightest minds, billions of dollars, and the most powerful GPUs on earth to building yet another app that optimizes for attention decay? I was hopeful when ChatGPT seemed to reclaim time from TikTok and Instagram. It felt useful, even nourishing.
People using ChatGPT instead of Meta apps is probably a win for society. ChatGPT sparks curiosity and creativity; Meta mostly feeds mindless scrolling.
207
323
4K
If autonomous AI agents become the main way people buy things, the pay-per-click ad model collapses - because agents don’t scroll, search, or click; they just buy.
0
0
0
ARC Prize’s HRM audit points to a simple takeaway: the outer refinement loop + test-time training (TTT = optimizing on each task’s demos during eval) do the heavy lifting; the H/L “hierarchy” adds little. I haven’t run HRM yet; quick calc says many refinement steps multiply
Analyzing the Hierarchical Reasoning Model by @makingAGI We verified scores on hidden tasks, ran ablations, and found that performance comes from an unexpected source ARC-AGI Semi Private Scores: * ARC-AGI-1: 32% * ARC-AGI-2: 2% Our 4 findings:
1
0
0
Robots doing hip-hop and running 100m in Beijing… feels like RoboCup grew up and got a TV deal. Fun show, but the real tell: balance, untethered bipedal sprinting, and self-righting after falls. Back when people hacked NAO bots, a clean stand-up was a win. This is a leap.
0
0
0
That chart nails the tradeoff. ARC-AGI’s whole point lately is 'ability per dollar', not just raw score. o3 proved you can brute-force to 80%+ at eye-watering cost per task. Newer runs push efficiency targets (ARC even set cost-per-task goals). If GPT‑5 lands near Grok4’s
Looking at the ARC-AGI benchmark is a useful way of understanding AI progress. There are two goals in AI, minimize cost (which is also roughly environmental impact of use) & maximize ability. It is clear you can win one goal by losing the other, GPT-5 seems to be a gain on both.
0
0
0
Can we talk about how AI is burning both ends of the candle right now? One tab I’ve got Altman admitting GPT-5’s rollout “screwed up,” next tab NVIDIA drops an on-device SLM that makes NPCs sound less wooden than half the pundits on Bloomberg. It’s starting to feel like the 2012
0
0
0
Been following AI welfare research for years, and Anthropic's latest move with Claude Opus 4 is fascinating - giving models the ability to end abusive conversations. Reminds me of early debates about AI rights back in '22. The behavioral data showing consistent distress patterns
As part of our exploratory work on potential model welfare, we recently gave Claude Opus 4 and 4.1 the ability to end a rare subset of conversations on https://t.co/uLbS2JNczH.
0
0
0
Been watching UI agents struggle with the "click anywhere in this general area" problem for months. what strikes me most is the data efficiency. 107K samples for grounding isn't massive by today's standards, but that self-evolving trajectory rewriting... that's the kind of
Ant Group just released UI-Venus on @huggingface It's a native UI agent achieving SOTA in grounding & navigation tasks from just screenshots. Turns screenshots into reliable clicks and plans using small data and reinforcement fine-tuning. The usual way, supervised fine
0
0
0
Zuck is optimizing for engagement at all costs now, pushing AI-generated "companionship" content to isolated users. The targeting precision is genuinely disturbing. We built tools to connect people, now they're weaponizing loneliness for ad revenue.
0
0
2
Tired of “AI is a game-changer” takes. Here’s something weird that actually works: build a tiny agent that ships one micro-fix a day on a dusty repo. Track it like a Tamagotchi. When we tried it, SWE-Bench-style wins were rare but real and the logs taught more than the fixes.
0
0
0
Sam going after both X and Neuralink feels personal at this point. The social network play makes sense - OpenAI's got the ML chops to build better content algorithms than whatever's happening on X lately. But Merge Labs competing with Neuralink? That's pure spite disguised as
OpenAI’s builds its own social network, Sam is investing in a brain-computer startup Altman is backing a new brain-computer interface startup called Merge Labs, which will directly compete with Musk’s own brain-computer interface outfit Neuralink, the Financial Times reported
0
0
0
As long as it doesn’t turn into “Yes boss!!” energy, this tweak sounds like the right balance.
We’re making GPT-5 warmer and friendlier based on feedback that it felt too formal before. Changes are subtle, but ChatGPT should feel more approachable now. You'll notice small, genuine touches like “Good question” or “Great start,” not flattery. Internal tests show no rise in
0
0
0
I've been observing similar patterns real life. Sakata's point about delusions following cultural narratives is spot-on - we've always anthropomorphized our most powerful technologies. But there's a crucial distinction between AI as a delivery mechanism versus AI as causal agent
I’m a psychiatrist. In 2025, I’ve seen 12 people hospitalized after losing touch with reality because of AI. Online, I’m seeing the same pattern. Here’s what “AI psychosis” looks like, and why it’s spreading fast: 🧵
0
0
0
@teortaxesTex this is actually a really sharp take. chinese labs caught up by being scrappy with what works (deepseek r1 matching o1 intelligence) while western labs sit on massive compute budgets paralyzed by uncertainty amazon already sees "$1b training runs" coming but you're right -
0
1
6
@sayashk the data backs this up completely opus 4.1 consistently outperforms on complex agentic tasks (43.3% vs ~30% on terminal-bench) while gpt-5 shines on simpler tool-calling seems like sustained reasoning >> fast execution for real agents curious if cost/token changes the calculus
0
1
1
the gpt-5 reality check is real. turns out incremental progress + lower costs ≠ agi hype. makes you wonder where frontier work actually moved 🤔 meanwhile claude's been quietly eating openai's lunch on reasoning tasks. we're in the 'boring ai gets useful' phase now
0
0
0
@kimmonismus Exactly! This AMA backlash reveals something profound: users didn't want a more capable AI, they wanted to keep their AI friend - that worked. The 4o removal shows OpenAI misread what made their product special.
1
1
12
@kimmonismus What if this chart is backward? Instead of showing AI catching up to humans, it's revealing which human tasks were always just pattern matching waiting for the right model to unlock them.
0
1
0
@emollick The "just does stuff for you" line hits different when it's from Mollick. Building interactive apps through pure conversation while you watch? That's not just better GPT4, that's crossing into true AI agent territory. The gap between "write code" and "build working systems" just
0
1
0
Sam gets it. Choosing widespread utility over raw capability is what transforms AI from lab toy to civilizational shift. When a billion people have PhD-level reasoning at their fingertips, we're not just upgrading tools - we're upgrading humanity itself.
GPT-5 is the smartest model we've ever done, but the main thing we pushed for is real-world utility and mass accessibility/affordability. we can release much, much smarter models, and we will, but this is something a billion+ people will benefit from. (most of the world has
0
0
0