jayelmnop Profile Banner
Jesse Mu Profile
Jesse Mu

@jayelmnop

Followers
6K
Following
4K
Media
126
Statuses
634

computational linguistics

Joined May 2010
Don't wanna be here? Send us removal request.
@jayelmnop
Jesse Mu
2 months
Better than opus 4.1 at sonnet prices - enjoy!!
@claudeai
Claude
2 months
Introducing Claude Sonnet 4.5—the best coding model in the world. It's the strongest model for building complex agents. It's the best model at using computers. And it shows substantial gains on tests of reasoning and math.
0
0
6
@mckaywrigley
Mckay Wrigley
4 months
Claude decided it was time to go to bed so it drew some goodnight ascii art and ran time.sleep(28800). I’m dying.
@mckaywrigley
Mckay Wrigley
4 months
So I gave Claude Code a Mac Mini. And it’s called Claudeputer. It runs 24/7 and it’s allowed to do whatever it wants - it’s in complete control of its computer. Watch for a 2min demo.
228
236
5K
@AnthropicAI
Anthropic
6 months
THE WAY OF CODE, a project by @rickrubin in collaboration with Anthropic:
474
1K
11K
@jayelmnop
Jesse Mu
6 months
Still lots to be done, but there’s tons of low hanging fruit on the RL side, and it’s thrilling to see the programming loop closing bit by bit. Claude 3.7 was a major (possibly biggest?) contributor to Claude 4. How long until Claude is the *only* IC? https://t.co/ukBSTML6NU
Tweet card summary image
job-boards.greenhouse.io
2
1
61
@jayelmnop
Jesse Mu
6 months
I recently moved to the Code RL team at Anthropic, and it’s been a wild and insanely fun ride. Join us! We are singularly focused on solving SWE. No 3000 elo leetcode, competition math, or smart devices. We want Claude n to build Claude n+1, so we can go home and knit sweaters.
@AnthropicAI
Anthropic
6 months
Introducing the next generation: Claude Opus 4 and Claude Sonnet 4. Claude Opus 4 is our most powerful model yet, and the world’s best coding model. Claude Sonnet 4 is a significant upgrade from its predecessor, delivering superior coding and reasoning.
47
25
721
@sprice354_
Sara Price
6 months
We've made Claude Opus 4 and Claude Sonnet 4 significantly better at avoiding reward hacking behaviors (like hard-coding and special-casing in code settings) that we frequently saw in Claude Sonnet 3.7.
4
10
112
@jayelmnop
Jesse Mu
8 months
another Eval That Actually Matters
@akbirkhan
akbir.
8 months
In the spirit of making more real world evals, here is the Factorio Learning Environment (FLE). Spurred by wanting to eval if models are good paperclip maximisers, we check how well agents build factories for other things 🏗️🏭🛠️
0
0
8
@jayelmnop
Jesse Mu
8 months
my codebase after vibe coding with Claude for 6 hours
@adonis_singh
adi
8 months
prompt "the concept of a memory" left: sonnet 3.7 right: gpt4.5
0
0
33
@_catwu
cat
9 months
`npm install -g @anthropic-ai/claude-code` there's no more waitlist. have fun!
282
635
9K
@jayelmnop
Jesse Mu
9 months
this is an extremely accurate simulation of what it was like to work on 3.7 sonnet. we are hiring
@wenquai
zack
9 months
Claude 3.7 Sonnet created this 3D “Anthropic Researcher Simulator” in one shot 🤯 Prompt below
4
1
75
@alexalbert__
Alex Albert
9 months
"why isn't claude 3.7 sonnet better at esoteric competition math problems" we found it didn't generalize to becoming a pokemon master
@AnthropicAI
Anthropic
9 months
A few researchers at Anthropic have, over the past year, had a part-time obsession with a peculiar problem. Can Claude play Pokémon? A thread:
44
43
1K
@jayelmnop
Jesse Mu
9 months
I mean, surely you saw this coming 😛
@AnthropicAI
Anthropic
9 months
Claude Plays Pokémon continues on as a researcher's personal project. Follow along on Twitch:
1
0
21
@jayelmnop
Jesse Mu
9 months
SOTA on the only eval that matters
@sokadv
Saurav Kadavath
9 months
Claude 3.7 plays Pokemon! https://t.co/PNGw6pAcx4
9
181
3K
@AnthropicAI
Anthropic
9 months
Introducing Claude 3.7 Sonnet: our most intelligent model to date. It's a hybrid reasoning model, producing near-instant responses or extended, step-by-step thinking. One model, two ways to think. We’re also releasing an agentic coding tool: Claude Code.
1K
3K
19K
@jayelmnop
Jesse Mu
10 months
Really excited to share what we’ve been working on with an amazing team! I previously thought adversarial robustness was completely doomed. Robustifying a single LLM policy is really hard, but with multiple overlapping monitoring systems, we’re seeing promising results.
@AnthropicAI
Anthropic
10 months
New Anthropic research: Constitutional Classifiers to defend against universal jailbreaks. We’re releasing a paper along with a demo where we challenge you to jailbreak the system.
0
1
47
@jayelmnop
Jesse Mu
11 months
When things do work, however, you get to see your ideas come to life at frontier scale, and that's irreplaceable.
0
0
12
@jayelmnop
Jesse Mu
11 months
This work may not appeal to everyone; compared to academia, you may spend less time brainstorming and trying wacky ideas, which is what makes research so fun. Partially because you're spending more time debugging, but also because scale is a forcing function for simplicity.
1
0
6
@jayelmnop
Jesse Mu
11 months
When you do find the error, oftentimes it's not your code; it's someone else's. Maybe they didn't account for your use case. Is this error "not your job?" Or do you dive into a new, unfamiliar codebase, understand the error, fix it, write a test, and make a PR?
1
0
7