Ryan
@_PaperMoose_
Followers
1K
Following
36K
Media
196
Statuses
3K
Built ARC-AGI 2 evals @gregkamrad. Ex-CTO @ DentoAI. Built https://t.co/JtLGCSctWE for Novo Nordisk. Building automated reliability testing for healthcare
SF
Joined August 2017
When you deploy an LLM-as-a-Judge, you’re shipping a classifier into production. Each new version is a hypothesis about how the model interprets the world. It’s data science, just expressed in natural language. Here’s what that looked like for a recent client project where we
8
14
130
Intelligence increases upside. Reliability removes risk. AI agents are already smart enough to do real work. What’s changing now is reliable access to the real world: phones, computers, portals, and all the messy systems humans use. When an agent can call, click, log in,
0
0
0
Reposting as a reminder to look at later
tried this. It's a different vision for how coding tuis should work, but clicks pretty quickly and makes a ton of sense: Instead of ephemeral sessions you chose and continue to train a agent each session. Using their MemGPT type memory tech, It rewrites its own prompts to become
0
0
1
When all options are good, choose based on identity, not ROI. ROI assumes stable conditions. Careers are not stable systems. Markets shift. Teams change. Technology rewrites the rules every few years. Identity holds when optimization breaks. Instead of asking: What pays more?
0
0
1
Enterprise AI adoption will be the hottest topic of 2026. We pre-empted this, and are launching Solaris: a platform to help enterprises understand what’s possible with AI and actually make it happen. We are already live with fast growing companies like Lendi Group, Eucalyptus.
36
28
119
If you stop paying attention to AI Twitter for one week, you won't know what your friends are talking about. GRPO. MCP. DSPY. Open Thoughts. Now imagine you're an engineering director at a Fortune 500 who hasn't been following this space closely. The gap between AI
0
2
2
The playbook I've seen work over and over: Do services long enough to see patterns Automate those patterns Build a product Sell to the clients who already trust you. You've built trust. You know their problems deeply. The product sell becomes: "This will automate what we've been
0
0
1
The models get better every 3-4 months. That means every piece of context you collect increases in value over time... it appreciates, not depreciates. Your #1 job as an AI-native company? Collect as much context as possible. Full stop. Every customer conversation Every meeting
0
0
0
A founder trap I’ve been noticing lately I’ve realized something a bit uncomfortable about how I work. When you’re reasonably capable, a lot of things stop feeling hard. You can build quickly, fix issues yourself, and remove friction as it comes up. On the surface, that feels
0
0
4
Solo founding is less about hustle and more about stress management. When you are the CEO, PM, engineer, and salesperson, the real bottleneck is not time. It is focus and emotional load. A few things that have helped me stay productive without burning out: • Design days
1
1
8
Just spent time with GPT-5.2 after using 5.1 heavily. The difference isn’t raw intelligence. It’s composure. Fewer weird tangents. Better judgment about when to push vs when to stay simple. Feels more like working with a senior collaborator than a clever intern.
0
0
1
Got my YC decision today. Not selected. They said the application was in the top 10%. That’s information, not identity. I’ve been doing this long enough to know better than to hinge my sense of self on any single outcome. Acceptances, rejections, near misses. They all blur
29
4
96
If you ever meet my friend @AnnieLiao_2000 you’ll understand immediately why I’m writing this. She is one of the most impressive founders I know. She runs BuildClub, which has quietly become the world’s preeminent digital transformation company for AI in the enterprise. They
2
1
26
GPT-5.2 Pro is SOTA on ARC-AGI, with two orders of magnitude efficiency improvement over the past year:
A year ago, we verified a preview of an unreleased version of @OpenAI o3 (High) that scored 88% on ARC-AGI-1 at est. $4.5k/task Today, we’ve verified a new GPT-5.2 Pro (X-High) SOTA score of 90.5% at $11.64/task This represents a ~390X efficiency improvement in one year
85
156
2K
Super excited that compaction is so much faster now. It’s not just that work can begin more quickly, it’s also the extension of the flow state that previous multi minute compaction eroded.
0
0
1
LLMs are getting a new superpower: Learning from their own failures. Basic loop: • Run code • Check where it breaks • Model proposes a fix • Keep only what gets better • Repeat Evolution powered by execution feedback. This is how AI code generation starts becoming
0
0
1
A lot of SF founders have been quietly terrified that AI will make us all irrelevant in 5 years. Recent talks from Ilya Sutskever and Andrej Karpathy finally broke that doom spiral. Ilya said the “just scale compute and data” era is ending. Models still generalize far worse
0
0
0
Opus 4.5 is my favorite model right now. Perfect blend of speed and intelligence.
0
0
1
Most tooling for LLM agents today is not evaluation tooling. It is observability tooling. Observability tells you what happened. Evaluation tells you what needs to change. Evals are how you refine your domain understanding. They help you spot recurring behavior patterns in your
0
0
2
Vibe coded apps have a certain smell. Lots of purple, beveled components. Weird bugs at both the UI and backend layers. You can tell.
0
0
0