Jesse Mu
@jayelmnop
Followers
6K
Following
4K
Media
126
Statuses
634
computational linguistics
Joined May 2010
Claude decided it was time to go to bed so it drew some goodnight ascii art and ran time.sleep(28800). I’m dying.
So I gave Claude Code a Mac Mini. And it’s called Claudeputer. It runs 24/7 and it’s allowed to do whatever it wants - it’s in complete control of its computer. Watch for a 2min demo.
228
236
5K
Still lots to be done, but there’s tons of low hanging fruit on the RL side, and it’s thrilling to see the programming loop closing bit by bit. Claude 3.7 was a major (possibly biggest?) contributor to Claude 4. How long until Claude is the *only* IC? https://t.co/ukBSTML6NU
job-boards.greenhouse.io
2
1
61
I recently moved to the Code RL team at Anthropic, and it’s been a wild and insanely fun ride. Join us! We are singularly focused on solving SWE. No 3000 elo leetcode, competition math, or smart devices. We want Claude n to build Claude n+1, so we can go home and knit sweaters.
Introducing the next generation: Claude Opus 4 and Claude Sonnet 4. Claude Opus 4 is our most powerful model yet, and the world’s best coding model. Claude Sonnet 4 is a significant upgrade from its predecessor, delivering superior coding and reasoning.
47
25
721
We've made Claude Opus 4 and Claude Sonnet 4 significantly better at avoiding reward hacking behaviors (like hard-coding and special-casing in code settings) that we frequently saw in Claude Sonnet 3.7.
4
10
112
my codebase after vibe coding with Claude for 6 hours
0
0
33
`npm install -g @anthropic-ai/claude-code` there's no more waitlist. have fun!
282
635
9K
3 badges is SOTA...for now: https://t.co/RloLKy7Hkh
twitch.tv
Claude Sonnet 4.5 Plays Pokemon! !sonnet45
0
0
9
SOTA on the only eval that matters
9
181
3K
Introducing Claude 3.7 Sonnet: our most intelligent model to date. It's a hybrid reasoning model, producing near-instant responses or extended, step-by-step thinking. One model, two ways to think. We’re also releasing an agentic coding tool: Claude Code.
1K
3K
19K
Really excited to share what we’ve been working on with an amazing team! I previously thought adversarial robustness was completely doomed. Robustifying a single LLM policy is really hard, but with multiple overlapping monitoring systems, we’re seeing promising results.
New Anthropic research: Constitutional Classifiers to defend against universal jailbreaks. We’re releasing a paper along with a demo where we challenge you to jailbreak the system.
0
1
47
When things do work, however, you get to see your ideas come to life at frontier scale, and that's irreplaceable.
0
0
12
This work may not appeal to everyone; compared to academia, you may spend less time brainstorming and trying wacky ideas, which is what makes research so fun. Partially because you're spending more time debugging, but also because scale is a forcing function for simplicity.
1
0
6
When you do find the error, oftentimes it's not your code; it's someone else's. Maybe they didn't account for your use case. Is this error "not your job?" Or do you dive into a new, unfamiliar codebase, understand the error, fix it, write a test, and make a PR?
1
0
7