
John Schulman
@johnschulman2
Followers
65K
Following
3K
Media
6
Statuses
122
Recently started @thinkymachines. Interested in reinforcement learning, alignment, birds, jazz music
Joined May 2021
good question... thinking back to pre-LLM interviews I experienced (before 2019)… they were all in-person on-site, no chance of ''llm cheating,'' very different across places, and somehow way more memorable. > old deepmind had brutal ''quizzes'' -- 2-hour marathons with 100+
At which of these places did you have the coolest interview in your career? I know it's an ill-posed poll, but what am i gonna do with only 4 options?! I tried grouping them by interview similarity to the best of my knowledge. Comment if "other". Might make a second round.
19
119
2K
I'm more annoyed at whoever named us homo sapiens sapiens
32
7
467
Amboy’s own Alvin Cailan just dropped a “Touchdown Chicken Dip” that goes crazy on a Wheat Thin. Pull up to Family Style Food Fest LA on 9/13 to try this and four more custom recipes. Sponsored by @WheatThins
2
2
10
Thinking Machines Lab exists to empower humanity through advancing collaborative general intelligence. We're building multimodal AI that works with how you naturally interact with the world - through conversation, through sight, through the messy way we collaborate. We're
639
685
8K
For people who don't like Claude's behavior here (and I think it's totally valid to disagree with it), I encourage you to describe your own recommended policy for agentic models should do when users ask them to help commit heinous crimes. Your options are (1) actively try to
125
41
715
A research project related to sycophancy: define explicit features like "does the response agree with the user" as in https://t.co/Ev5Q2PrpjK, and then construct a preference function that subtracts out their effect, as in https://t.co/kEaBgqar9V. I.e., remove some bad causal
8
20
277
Whether to collect preferences ("do you prefer response A or B?") from the same person who wrote the prompt, or a different person, is important and understudied. Highlighted this question in a recent talk https://t.co/7fcGmvG1Kd. Sycophancy probably results when you have the
This is serious, and we should make sure to prevent sycophantism as much as possible... Related: have we tried using other humans' feedback for RLHF instead of the original prompter's? This might somewhat help with debiasing 🤔
12
34
377
Excited to build a new AI research lab with some of my favorite former colleagues and some great new ones. Looking forward to sharing more in the coming weeks.
Today, we are excited to announce Thinking Machines Lab ( https://t.co/gD5QlPMfWw), an artificial intelligence research and product company. We are scientists, engineers, and builders behind some of the most widely used AI products and libraries, including ChatGPT,
41
48
1K
I was happy to see the second version of the OpenAI Model Spec released last week. Sharing my notes: - One notable change is that each section is labeled with an authority level, from "platform" (can't be overridden by the user or developer) to "guideline" (can be easily
15
23
363
Confirming that I left Anthropic last week. Leaving wasn't easy because I enjoyed the stimulating research environment and the kind and talented people I was working with, but I decided to go with another opportunity that I found extremely compelling. I'll share more details in
88
84
3K
There are some intriguing similarities between the r1 chains of thought and the o1-preview CoTs shared in papers and blog posts (eg https://t.co/rF8XaSsi2j). In particular, note the heavy use of the words "wait" and "alternatively" as a transition words for error correction and
36
40
736
What can AI researchers do *today* that AI developers will find useful for ensuring the safety of future advanced AI systems? To ring in the new year, the Anthropic Alignment Science team is sharing some thoughts on research directions we think are important.
10
67
329
Announcing Transluce, a nonprofit research lab building open source, scalable technology for understanding AI systems and steering them in the public interest. Read a letter from the co-founders Jacob Steinhardt and Sarah Schwettmann: https://t.co/IUIhBjpYhS
34
147
698
I shared the following note with my OpenAI colleagues today: I've made the difficult decision to leave OpenAI. This choice stems from my desire to deepen my focus on AI alignment, and to start a new chapter of my career where I can return to hands-on technical work. I've decided
184
406
5K
To deepen the public conversation about how AI models should behave, we’re sharing our Model Spec — our approach to shaping desired model behavior.
openai.com
426
341
2K
I'd like to see some research on where the political and moral ideologies of RLHF'd language models come from. Make some questionairres that measure a model's ideology. Create a variety of models with few-shot prompting, SFT, and RL; look at the ideology at each stage and how it
20
19
271
That said, these public outcries important for spurring us to solve these problems and develop better alignment tech
4
3
112
Now that another LM product is getting flack, I can say this without sounding too self-serving: Alignment -- controlling a model's behavior and values -- is still a pretty young discipline. Annoying refusals or hyper-wokeness are usually bugs rather than features
26
53
528
"Trust region utilitarianism": there is a sensible utility function to maximize, but it's only valid locally around the current state of the world, where the intuitions that produced it are grounded. "Repugnant conclusion" is outside trust region -- not a problem
7
6
108
Coming soon to your favorite word processor Ctrl-alt-V: "paste and paraphrase" also, "paste and match writing style"
10
15
210