_PaperMoose_ Profile Banner
Ryan Profile
Ryan

@_PaperMoose_

Followers
1K
Following
36K
Media
196
Statuses
3K

Built ARC-AGI 2 evals @gregkamrad. Ex-CTO @ DentoAI. Built https://t.co/JtLGCSctWE for Novo Nordisk. Building automated reliability testing for healthcare

SF
Joined August 2017
Don't wanna be here? Send us removal request.
@_PaperMoose_
Ryan
2 months
When you deploy an LLM-as-a-Judge, you’re shipping a classifier into production. Each new version is a hypothesis about how the model interprets the world. It’s data science, just expressed in natural language. Here’s what that looked like for a recent client project where we
8
14
130
@_PaperMoose_
Ryan
3 hours
Intelligence increases upside. Reliability removes risk. AI agents are already smart enough to do real work. What’s changing now is reliable access to the real world: phones, computers, portals, and all the messy systems humans use. When an agent can call, click, log in,
0
0
0
@_PaperMoose_
Ryan
6 hours
Reposting as a reminder to look at later
@tobi
tobi lutke
1 day
tried this. It's a different vision for how coding tuis should work, but clicks pretty quickly and makes a ton of sense: Instead of ephemeral sessions you chose and continue to train a agent each session. Using their MemGPT type memory tech, It rewrites its own prompts to become
0
0
1
@_PaperMoose_
Ryan
18 hours
When all options are good, choose based on identity, not ROI. ROI assumes stable conditions. Careers are not stable systems. Markets shift. Teams change. Technology rewrites the rules every few years. Identity holds when optimization breaks. Instead of asking: What pays more?
0
0
1
@AnnieLiao_2000
Annie Liao
1 day
Enterprise AI adoption will be the hottest topic of 2026. We pre-empted this, and are launching Solaris: a platform to help enterprises understand what’s possible with AI and actually make it happen. We are already live with fast growing companies like Lendi Group, Eucalyptus.
36
28
119
@_PaperMoose_
Ryan
1 day
If you stop paying attention to AI Twitter for one week, you won't know what your friends are talking about. GRPO. MCP. DSPY. Open Thoughts. Now imagine you're an engineering director at a Fortune 500 who hasn't been following this space closely. The gap between AI
0
2
2
@_PaperMoose_
Ryan
2 days
The playbook I've seen work over and over: Do services long enough to see patterns Automate those patterns Build a product Sell to the clients who already trust you. You've built trust. You know their problems deeply. The product sell becomes: "This will automate what we've been
0
0
1
@_PaperMoose_
Ryan
2 days
The models get better every 3-4 months. That means every piece of context you collect increases in value over time... it appreciates, not depreciates. Your #1 job as an AI-native company? Collect as much context as possible. Full stop. Every customer conversation Every meeting
0
0
0
@_PaperMoose_
Ryan
3 days
A founder trap I’ve been noticing lately I’ve realized something a bit uncomfortable about how I work. When you’re reasonably capable, a lot of things stop feeling hard. You can build quickly, fix issues yourself, and remove friction as it comes up. On the surface, that feels
0
0
4
@_PaperMoose_
Ryan
4 days
Solo founding is less about hustle and more about stress management. When you are the CEO, PM, engineer, and salesperson, the real bottleneck is not time. It is focus and emotional load. A few things that have helped me stay productive without burning out: • Design days
1
1
8
@_PaperMoose_
Ryan
5 days
Just spent time with GPT-5.2 after using 5.1 heavily. The difference isn’t raw intelligence. It’s composure. Fewer weird tangents. Better judgment about when to push vs when to stay simple. Feels more like working with a senior collaborator than a clever intern.
0
0
1
@_PaperMoose_
Ryan
6 days
Got my YC decision today. Not selected. They said the application was in the top 10%. That’s information, not identity. I’ve been doing this long enough to know better than to hinge my sense of self on any single outcome. Acceptances, rejections, near misses. They all blur
29
4
96
@_PaperMoose_
Ryan
7 days
If you ever meet my friend @AnnieLiao_2000 you’ll understand immediately why I’m writing this. She is one of the most impressive founders I know. She runs BuildClub, which has quietly become the world’s preeminent digital transformation company for AI in the enterprise. They
2
1
26
@gdb
Greg Brockman
7 days
GPT-5.2 Pro is SOTA on ARC-AGI, with two orders of magnitude efficiency improvement over the past year:
@arcprize
ARC Prize
7 days
A year ago, we verified a preview of an unreleased version of @OpenAI o3 (High) that scored 88% on ARC-AGI-1 at est. $4.5k/task Today, we’ve verified a new GPT-5.2 Pro (X-High) SOTA score of 90.5% at $11.64/task This represents a ~390X efficiency improvement in one year
85
156
2K
@_PaperMoose_
Ryan
8 days
Super excited that compaction is so much faster now. It’s not just that work can begin more quickly, it’s also the extension of the flow state that previous multi minute compaction eroded.
0
0
1
@_PaperMoose_
Ryan
9 days
LLMs are getting a new superpower: Learning from their own failures. Basic loop: • Run code • Check where it breaks • Model proposes a fix • Keep only what gets better • Repeat Evolution powered by execution feedback. This is how AI code generation starts becoming
0
0
1
@_PaperMoose_
Ryan
10 days
A lot of SF founders have been quietly terrified that AI will make us all irrelevant in 5 years. Recent talks from Ilya Sutskever and Andrej Karpathy finally broke that doom spiral. Ilya said the “just scale compute and data” era is ending. Models still generalize far worse
0
0
0
@_PaperMoose_
Ryan
13 days
Opus 4.5 is my favorite model right now. Perfect blend of speed and intelligence.
0
0
1
@_PaperMoose_
Ryan
14 days
Most tooling for LLM agents today is not evaluation tooling. It is observability tooling. Observability tells you what happened. Evaluation tells you what needs to change. Evals are how you refine your domain understanding. They help you spot recurring behavior patterns in your
0
0
2
@_PaperMoose_
Ryan
15 days
Vibe coded apps have a certain smell. Lots of purple, beveled components. Weird bugs at both the UI and backend layers. You can tell.
0
0
0