johnschulman2 Profile Banner
John Schulman Profile
John Schulman

@johnschulman2

Followers
63K
Following
2K
Media
6
Statuses
118

Recently started @thinkymachines. Interested in reinforcement learning, alignment, birds, jazz music

Joined May 2021
Don't wanna be here? Send us removal request.
@johnschulman2
John Schulman
2 months
For people who don't like Claude's behavior here (and I think it's totally valid to disagree with it), I encourage you to describe your own recommended policy for agentic models should do when users ask them to help commit heinous crimes. Your options are (1) actively try to.
127
40
707
@johnschulman2
John Schulman
2 months
A research project related to sycophancy: define explicit features like "does the response agree with the user" as in and then construct a preference function that subtracts out their effect, as in I.e., remove some bad causal.
6
19
272
@johnschulman2
John Schulman
2 months
Whether to collect preferences ("do you prefer response A or B?") from the same person who wrote the prompt, or a different person, is important and understudied. Highlighted this question in a recent talk Sycophancy probably results when you have the
Tweet media one
@BlackHC
Andreas Kirsch 🇺🇦
2 months
This is serious, and we should make sure to prevent sycophantism as much as possible. Related: have we tried using other humans' feedback for RLHF instead of the original prompter's? This might somewhat help with debiasing 🤔.
11
33
374
@johnschulman2
John Schulman
5 months
Excited to build a new AI research lab with some of my favorite former colleagues and some great new ones. Looking forward to sharing more in the coming weeks.
@thinkymachines
Thinking Machines
5 months
Today, we are excited to announce Thinking Machines Lab (, an artificial intelligence research and product company. We are scientists, engineers, and builders behind some of the most widely used AI products and libraries, including ChatGPT,.
41
46
1K
@johnschulman2
John Schulman
5 months
Actually 2 days ago, not last week :).
1
1
35
@johnschulman2
John Schulman
5 months
I was happy to see the second version of the OpenAI Model Spec released last week. Sharing my notes:. - One notable change is that each section is labeled with an authority level, from "platform" (can't be overridden by the user or developer) to "guideline" (can be easily.
15
23
364
@johnschulman2
John Schulman
5 months
Confirming that I left Anthropic last week. Leaving wasn't easy because I enjoyed the stimulating research environment and the kind and talented people I was working with, but I decided to go with another opportunity that I found extremely compelling. I'll share more details in.
88
85
3K
@johnschulman2
John Schulman
6 months
There are some intriguing similarities between the r1 chains of thought and the o1-preview CoTs shared in papers and blog posts (eg . In particular, note the heavy use of the words "wait" and "alternatively" as a transition words for error correction and.
36
42
737
@johnschulman2
John Schulman
6 months
RT @saprmarks: What can AI researchers do *today* that AI developers will find useful for ensuring the safety of future advanced AI systems….
0
66
0
@johnschulman2
John Schulman
9 months
RT @TransluceAI: Announcing Transluce, a nonprofit research lab building open source, scalable technology for understanding AI systems and….
0
147
0
@johnschulman2
John Schulman
11 months
I shared the following note with my OpenAI colleagues today:. I've made the difficult decision to leave OpenAI. This choice stems from my desire to deepen my focus on AI alignment, and to start a new chapter of my career where I can return to hands-on technical work. I've decided.
184
407
5K
@johnschulman2
John Schulman
1 year
RT @OpenAI: To deepen the public conversation about how AI models should behave, we’re sharing our Model Spec — our approach to shaping des….
0
336
0
@johnschulman2
John Schulman
1 year
I'd like to see some research on where the political and moral ideologies of RLHF'd language models come from. Make some questionairres that measure a model's ideology. Create a variety of models with few-shot prompting, SFT, and RL; look at the ideology at each stage and how it.
20
19
272
@johnschulman2
John Schulman
1 year
That said, these public outcries important for spurring us to solve these problems and develop better alignment tech.
4
3
113
@johnschulman2
John Schulman
1 year
Now that another LM product is getting flack, I can say this without sounding too self-serving:. Alignment -- controlling a model's behavior and values -- is still a pretty young discipline. Annoying refusals or hyper-wokeness are usually bugs rather than features.
26
53
533
@johnschulman2
John Schulman
2 years
"Trust region utilitarianism": there is a sensible utility function to maximize, but it's only valid locally around the current state of the world, where the intuitions that produced it are grounded. "Repugnant conclusion" is outside trust region -- not a problem.
7
6
109
@johnschulman2
John Schulman
2 years
Coming soon to your favorite word processor.Ctrl-alt-V: "paste and paraphrase".also, "paste and match writing style".
10
15
210
@johnschulman2
John Schulman
2 years
A compelling intuition is that deep learning does approximate Solomonoff induction, finding a mixture of the programs that explain the data, weighted by complexity. Finding a more precise version of this claim that's actually true would help us understand why deep learning works.
17
92
660
@johnschulman2
John Schulman
2 years
I've been enjoying @RichardMCNgo's sci-fi writing at narrativeark dot xyz. It's a rare feat to combine these three properties: (1) about post-AGI worlds (2) plausible (3) actually fun to read.
2
5
109
@johnschulman2
John Schulman
2 years
Stumbled upon this charming short story, "Someday", by Isaac Asimov: Features a language model called Bard, which the boys fine-tune on some recent data discussing itself and other LMs. .
8
17
85