Jesse Mu @jayelmnop X Profile

Jesse Mu

@jayelmnop

Followers

6K

Following

4K

Media

126

Statuses

634

computational linguistics

https://t.co/FfC57V9Vee

Joined May 2010

Don't wanna be here? Send us removal request.

Jesse Mu

@jayelmnop

2 months

Better than opus 4.1 at sonnet prices - enjoy!!

Claude

@claudeai

2 months

Introducing Claude Sonnet 4.5—the best coding model in the world. It's the strongest model for building complex agents. It's the best model at using computers. And it shows substantial gains on tests of reasoning and math.

0

6

Mckay Wrigley

@mckaywrigley

4 months

Claude decided it was time to go to bed so it drew some goodnight ascii art and ran time.sleep(28800). I’m dying.

Mckay Wrigley

@mckaywrigley

4 months

So I gave Claude Code a Mac Mini. And it’s called Claudeputer. It runs 24/7 and it’s allowed to do whatever it wants - it’s in complete control of its computer. Watch for a 2min demo.

228

236

5K

Anthropic

@AnthropicAI

6 months

THE WAY OF CODE, a project by @rickrubin in collaboration with Anthropic:

474

1K

11K

Jesse Mu

@jayelmnop

6 months

Still lots to be done, but there’s tons of low hanging fruit on the RL side, and it’s thrilling to see the programming loop closing bit by bit. Claude 3.7 was a major (possibly biggest?) contributor to Claude 4. How long until Claude is the *only* IC? https://t.co/ukBSTML6NU

job-boards.greenhouse.io

2

1

61

Jesse Mu

@jayelmnop

6 months

I recently moved to the Code RL team at Anthropic, and it’s been a wild and insanely fun ride. Join us! We are singularly focused on solving SWE. No 3000 elo leetcode, competition math, or smart devices. We want Claude n to build Claude n+1, so we can go home and knit sweaters.

Anthropic

@AnthropicAI

6 months

Introducing the next generation: Claude Opus 4 and Claude Sonnet 4. Claude Opus 4 is our most powerful model yet, and the world’s best coding model. Claude Sonnet 4 is a significant upgrade from its predecessor, delivering superior coding and reasoning.

47

25

721

Sara Price

@sprice354_

6 months

We've made Claude Opus 4 and Claude Sonnet 4 significantly better at avoiding reward hacking behaviors (like hard-coding and special-casing in code settings) that we frequently saw in Claude Sonnet 3.7.

4

10

112

Jesse Mu

@jayelmnop

8 months

another Eval That Actually Matters

akbir.

@akbirkhan

8 months

In the spirit of making more real world evals, here is the Factorio Learning Environment (FLE). Spurred by wanting to eval if models are good paperclip maximisers, we check how well agents build factories for other things 🏗️🏭🛠️

0

8

Jesse Mu

@jayelmnop

8 months

my codebase after vibe coding with Claude for 6 hours

adi

@adonis_singh

8 months

prompt "the concept of a memory" left: sonnet 3.7 right: gpt4.5

0

33

cat

@_catwu

9 months

`npm install -g @anthropic-ai/claude-code` there's no more waitlist. have fun!

282

635

9K

Jesse Mu

@jayelmnop

9 months

this is an extremely accurate simulation of what it was like to work on 3.7 sonnet. we are hiring

zack

@wenquai

9 months

Claude 3.7 Sonnet created this 3D “Anthropic Researcher Simulator” in one shot 🤯 Prompt below

4

1

75

Jesse Mu

@jayelmnop

9 months

3 badges is SOTA...for now: https://t.co/RloLKy7Hkh

twitch.tv

Claude Sonnet 4.5 Plays Pokemon! !sonnet45

0

9

Alex Albert

@alexalbert__

9 months

"why isn't claude 3.7 sonnet better at esoteric competition math problems" we found it didn't generalize to becoming a pokemon master

Anthropic

@AnthropicAI

9 months

A few researchers at Anthropic have, over the past year, had a part-time obsession with a peculiar problem. Can Claude play Pokémon? A thread:

44

43

1K

Jesse Mu

@jayelmnop

9 months

I mean, surely you saw this coming 😛

Anthropic

@AnthropicAI

9 months

Claude Plays Pokémon continues on as a researcher's personal project. Follow along on Twitch:

1

0

21

Jesse Mu

@jayelmnop

9 months

SOTA on the only eval that matters

Saurav Kadavath

@sokadv

9 months

Claude 3.7 plays Pokemon! https://t.co/PNGw6pAcx4

9

181

3K

Anthropic

@AnthropicAI

9 months

Introducing Claude 3.7 Sonnet: our most intelligent model to date. It's a hybrid reasoning model, producing near-instant responses or extended, step-by-step thinking. One model, two ways to think. We’re also releasing an agentic coding tool: Claude Code.

1K

3K

19K

Jesse Mu

@jayelmnop

10 months

Really excited to share what we’ve been working on with an amazing team! I previously thought adversarial robustness was completely doomed. Robustifying a single LLM policy is really hard, but with multiple overlapping monitoring systems, we’re seeing promising results.

Anthropic

@AnthropicAI

10 months

New Anthropic research: Constitutional Classifiers to defend against universal jailbreaks. We’re releasing a paper along with a demo where we challenge you to jailbreak the system.

0

1

47

Edgar Shaghoulian

@eshaghoulian

11 months

https://t.co/MuMgadqGeK

arxiv.org

``Pasta alla Cacio e pepe'' is a traditional Italian dish made with pasta, pecorino cheese, and pepper. Despite its simple ingredient list, achieving the perfect texture and creaminess of the...

6

67

289

Jesse Mu

@jayelmnop

11 months

When things do work, however, you get to see your ideas come to life at frontier scale, and that's irreplaceable.

0

12

Jesse Mu

@jayelmnop

11 months

This work may not appeal to everyone; compared to academia, you may spend less time brainstorming and trying wacky ideas, which is what makes research so fun. Partially because you're spending more time debugging, but also because scale is a forcing function for simplicity.

1

0

6

Jesse Mu

@jayelmnop

11 months

When you do find the error, oftentimes it's not your code; it's someone else's. Maybe they didn't account for your use case. Is this error "not your job?" Or do you dive into a new, unfamiliar codebase, understand the error, fix it, write a test, and make a PR?

1

0

7