Ryan @_PaperMoose_ X Profile

Ryan

@_PaperMoose_

Followers

1K

Following

36K

Media

196

Statuses

3K

Built ARC-AGI 2 evals @gregkamrad. Ex-CTO @ DentoAI. Built https://t.co/JtLGCSctWE for Novo Nordisk. Building automated reliability testing for healthcare

https://t.co/8AbP5xJC34

SF

Joined August 2017

Don't wanna be here? Send us removal request.

Ryan

@_PaperMoose_

2 months

When you deploy an LLM-as-a-Judge, you’re shipping a classifier into production. Each new version is a hypothesis about how the model interprets the world. It’s data science, just expressed in natural language. Here’s what that looked like for a recent client project where we

8

14

130

Ryan

@_PaperMoose_

3 hours

Intelligence increases upside. Reliability removes risk. AI agents are already smart enough to do real work. What’s changing now is reliable access to the real world: phones, computers, portals, and all the messy systems humans use. When an agent can call, click, log in,

0

Ryan

@_PaperMoose_

6 hours

Reposting as a reminder to look at later

tobi lutke

@tobi

1 day

tried this. It's a different vision for how coding tuis should work, but clicks pretty quickly and makes a ton of sense: Instead of ephemeral sessions you chose and continue to train a agent each session. Using their MemGPT type memory tech, It rewrites its own prompts to become

0

1

Ryan

@_PaperMoose_

18 hours

When all options are good, choose based on identity, not ROI. ROI assumes stable conditions. Careers are not stable systems. Markets shift. Teams change. Technology rewrites the rules every few years. Identity holds when optimization breaks. Instead of asking: What pays more?

0

1

Annie Liao

@AnnieLiao_2000

1 day

Enterprise AI adoption will be the hottest topic of 2026. We pre-empted this, and are launching Solaris: a platform to help enterprises understand what’s possible with AI and actually make it happen. We are already live with fast growing companies like Lendi Group, Eucalyptus.

36

28

119

Ryan

@_PaperMoose_

1 day

If you stop paying attention to AI Twitter for one week, you won't know what your friends are talking about. GRPO. MCP. DSPY. Open Thoughts. Now imagine you're an engineering director at a Fortune 500 who hasn't been following this space closely. The gap between AI

0

2

Ryan

@_PaperMoose_

2 days

The playbook I've seen work over and over: Do services long enough to see patterns Automate those patterns Build a product Sell to the clients who already trust you. You've built trust. You know their problems deeply. The product sell becomes: "This will automate what we've been

0

1

Ryan

@_PaperMoose_

2 days

The models get better every 3-4 months. That means every piece of context you collect increases in value over time... it appreciates, not depreciates. Your #1 job as an AI-native company? Collect as much context as possible. Full stop. Every customer conversation Every meeting

0

Ryan

@_PaperMoose_

3 days

A founder trap I’ve been noticing lately I’ve realized something a bit uncomfortable about how I work. When you’re reasonably capable, a lot of things stop feeling hard. You can build quickly, fix issues yourself, and remove friction as it comes up. On the surface, that feels

0

4

Ryan

@_PaperMoose_

4 days

Solo founding is less about hustle and more about stress management. When you are the CEO, PM, engineer, and salesperson, the real bottleneck is not time. It is focus and emotional load. A few things that have helped me stay productive without burning out: • Design days

1

8

Ryan

@_PaperMoose_

5 days

Just spent time with GPT-5.2 after using 5.1 heavily. The difference isn’t raw intelligence. It’s composure. Fewer weird tangents. Better judgment about when to push vs when to stay simple. Feels more like working with a senior collaborator than a clever intern.

0

1

Ryan

@_PaperMoose_

6 days

Got my YC decision today. Not selected. They said the application was in the top 10%. That’s information, not identity. I’ve been doing this long enough to know better than to hinge my sense of self on any single outcome. Acceptances, rejections, near misses. They all blur

29

4

96

Ryan

@_PaperMoose_

7 days

If you ever meet my friend @AnnieLiao_2000 you’ll understand immediately why I’m writing this. She is one of the most impressive founders I know. She runs BuildClub, which has quietly become the world’s preeminent digital transformation company for AI in the enterprise. They

2

1

26

Greg Brockman

@gdb

7 days

GPT-5.2 Pro is SOTA on ARC-AGI, with two orders of magnitude efficiency improvement over the past year:

ARC Prize

@arcprize

7 days

A year ago, we verified a preview of an unreleased version of @OpenAI o3 (High) that scored 88% on ARC-AGI-1 at est. $4.5k/task Today, we’ve verified a new GPT-5.2 Pro (X-High) SOTA score of 90.5% at $11.64/task This represents a ~390X efficiency improvement in one year

85

156

2K

Ryan

@_PaperMoose_

8 days

Super excited that compaction is so much faster now. It’s not just that work can begin more quickly, it’s also the extension of the flow state that previous multi minute compaction eroded.

0

1

Ryan

@_PaperMoose_

9 days

LLMs are getting a new superpower: Learning from their own failures. Basic loop: • Run code • Check where it breaks • Model proposes a fix • Keep only what gets better • Repeat Evolution powered by execution feedback. This is how AI code generation starts becoming

0

1

Ryan

@_PaperMoose_

10 days

A lot of SF founders have been quietly terrified that AI will make us all irrelevant in 5 years. Recent talks from Ilya Sutskever and Andrej Karpathy finally broke that doom spiral. Ilya said the “just scale compute and data” era is ending. Models still generalize far worse

0

Ryan

@_PaperMoose_

13 days

Opus 4.5 is my favorite model right now. Perfect blend of speed and intelligence.

0

1

Ryan

@_PaperMoose_

14 days

Most tooling for LLM agents today is not evaluation tooling. It is observability tooling. Observability tells you what happened. Evaluation tells you what needs to change. Evals are how you refine your domain understanding. They help you spot recurring behavior patterns in your

0

2

Ryan

@_PaperMoose_

15 days

Vibe coded apps have a certain smell. Lots of purple, beveled components. Weird bugs at both the UI and backend layers. You can tell.

0