Rohin Shah Profile Banner
Rohin Shah Profile
Rohin Shah

@rohinmshah

Followers
5,500
Following
89
Media
17
Statuses
313

Research Scientist at DeepMind. I publish the Alignment Newsletter.

London, UK
Joined October 2017
Don't wanna be here? Send us removal request.
Explore trending content on Musk Viewer
@rohinmshah
Rohin Shah
2 years
I'm hiring for Research Scientist and Research Engineer roles on the Alignment team at DeepMind! Come talk to me at the EA Global London career fair (4-8 pm Friday 15 April), or DM / email me and I can let you know when the job ad officially goes up.
6
50
481
@rohinmshah
Rohin Shah
5 years
Excited to share our work: collaboration requires understanding! In Overcooked, self-play doesn't gel with humans: it expects them to play like itself. (1/4) Demo: Blog: Paper: Code:
4
120
367
@rohinmshah
Rohin Shah
5 years
We've released our work on inferring human preferences from the state of the world! By thinking about what "must have happened" in the past, we can infer what should or shouldn't be done. Blog post: Paper: (1/4)
2
44
193
@rohinmshah
Rohin Shah
3 years
Yesterday we announced the BASALT competition. Why did we make it? TL;DR: It lets us test our ability to build agents that solve fuzzy tasks where rewards are hard to specify. (1/6) Participate: Paper: Blog:
3
34
139
@rohinmshah
Rohin Shah
2 years
Excited to release our work on Goal MisGeneralisation (GMG)! In GMG, a model trained with a correct reward function pursues undesirable goals. E.g. this agent competently gets negative rewards. Blog: Paper:
@GoogleDeepMind
Google DeepMind
2 years
An agent trained with a correct reward function might pursue an undesired goal if that goal correlates with reward during training. New research provides examples in this vital area of safety, like this ⬇️ blue agent that follows an adversarial red bot: 1/
10
54
275
4
28
134
@rohinmshah
Rohin Shah
1 year
We're hiring again, just like last year! Apply here: Research Scientist: Research Engineer:
@rohinmshah
Rohin Shah
2 years
I'm hiring for Research Scientist and Research Engineer roles on the Alignment team at DeepMind! Come talk to me at the EA Global London career fair (4-8 pm Friday 15 April), or DM / email me and I can let you know when the job ad officially goes up.
6
50
481
2
17
92
@rohinmshah
Rohin Shah
3 months
Despite the constant arguments on p(doom), many agree that *if* AI systems become highly capable in risky domains, *then* we ought to mitigate those risks. So we built an eval suite to see whether AI systems are highly capable in risky domains.
@tshevl
Toby Shevlane
3 months
In 2024, the AI community will develop more capable AI systems than ever before. How do we know what new risks to protect against, and what the stakes are? Our research team at @GoogleDeepMind built a set of evaluations to measure potentially dangerous capabilities: 🧵
Tweet media one
7
45
232
0
12
89
@rohinmshah
Rohin Shah
3 years
New #ICLR2021 paper by @davlindner , me, @pabbeel and @ancadianadragan , where we learn rewards from the state of the world. This HalfCheetah was trained from a single state sampled from a balancing policy! 💡 Blog: 📑 Paper: (1/5)
1
15
70
@rohinmshah
Rohin Shah
3 years
We’ve launched the MineRL BASALT competition on learning from human feedback! I’m especially excited about this because it mimics the situation we face in realistic settings much more than other benchmarks. I hope to see many of you participating!
@minerl_official
MineRL Project
3 years
We are excited to announce that the 2021 @NeurIPSConf MineRL BASALT Competition on Learning from Human Feedback has officially started! Sign up to participate here: Keep reading to learn more about the competition! 1/6
4
43
203
0
18
69
@rohinmshah
Rohin Shah
3 months
I loved working with Anca during my PhD, and now I get to do it again! Though there is one downside, she's going to notice how much more lax I've become about really nailing my talks and figures 😅
@ancadianadragan
Anca Dragan
3 months
So excited and so very humbled to be stepping in to head AI Safety and Alignment at @GoogleDeepMind . Lots of work ahead, both for present-day issues and for extreme risks in anticipation of capabilities advancing.
31
41
605
1
2
67
@rohinmshah
Rohin Shah
1 month
I've really liked @METR_Evals approach to rigorously evaluate the plausibility of concrete AI threat models, and say in advance how to mitigate them. I'm excited that we at @GoogleDeepMind have now made our contribution!
@AllanDafoe
Allan Dafoe
1 month
As we push the boundaries of AI, it's critical that we stay ahead of potential risks. I'm thrilled to announce @GoogleDeepMind 's Frontier Safety Framework - our approach to analyzing and mitigating future risks posed by advanced AI models. 1/N
6
44
174
1
5
56
@rohinmshah
Rohin Shah
3 years
Do we need to put in active effort in order to align intelligent AI systems? In this work we formalize one of the arguments that says “yes”. Check it out at our NeurIPS poster session tomorrow (details in thread)!
@Turn_Trout
Alex Turner
3 years
#NeurIPS2021 spotlight: Optimal policies tend to seek power. Consider Pac-Man: Dying traps Pac-Man in one state forever, while staying alive lets him do more things. Our theorems show that for this reason, for most reward functions, it’s optimal for Pac-Man to stay alive. 🧵:
6
47
192
2
4
54
@rohinmshah
Rohin Shah
3 years
I'm a big fan of Minecraft as an AI research environment -- it's far more open and complex (like the real world) relative to Atari and Mujoco. This competition is about obtaining diamonds, and we soon we'll release another competition on "fuzzy" tasks without known rewards!
@minerl_official
MineRL Project
3 years
We are so excited to announce that the first round of the 2021 @NeurIPSConf MineRL Diamond Competition has officially started! 💎 Submit your agents here: We have some exciting new changes this year. Keep reading to learn more! 1/4
3
43
136
1
7
46
@rohinmshah
Rohin Shah
2 years
Example projects: (a) writing plausible stories of doom, (b) mechanistic interpretability for transformers, (c) empirical demos of inner alignment failures, (d) using causal diagrams to understand incentives, (e) formalizing "knowledge" and "agency"
2
2
46
@rohinmshah
Rohin Shah
1 year
I really enjoyed recording this podcast -- it's very different from my previous podcasts, much more focused on opinions and impressions, rather than specific technical points.
@robertwiblin
Rob Wiblin
1 year
Very happy with this episode I recorded with @rohinmshah of DeepMind's safety team which we just dropped. I ask for his personal opinions on all kinds of issues: • Case for and against slowing down • Where he disagrees with ML folks and LWers • More!
0
15
93
2
2
38
@rohinmshah
Rohin Shah
2 years
We aim for our research to influence DeepMind's AI work more broadly. We've seen this happen before: DeepMind now uses human feedback frequently, especially for language models. If that's interesting to you, @geoffreyirving is hiring for such roles!
@geoffreyirving
Geoffrey Irving
2 years
I will be at the EA Global London 2022 Career fair from 4 - 8 pm on Friday 15 April, and am hiring for roles on DeepMind's Scalable Alignment Team. Please say hello if you're interested in AGI safety and language models, both using LMs for safety and mitigating harms from LMs.
5
14
112
0
0
34
@rohinmshah
Rohin Shah
2 years
The job ads for the Alignment team at DeepMind are now up! Details about the team are in the quoted thread. We're hiring for: Research Scientists: Research Engineers:
@rohinmshah
Rohin Shah
2 years
I'm hiring for Research Scientist and Research Engineer roles on the Alignment team at DeepMind! Come talk to me at the EA Global London career fair (4-8 pm Friday 15 April), or DM / email me and I can let you know when the job ad officially goes up.
6
50
481
1
9
32
@rohinmshah
Rohin Shah
4 years
Wondering what the field of long-term AI safety does, but don't want to read hundreds of posts? Check out my review of work done in 2018-19! Please do leave comments and suggestions: The summary is also Alignment Newsletter #84 :
0
9
33
@rohinmshah
Rohin Shah
2 years
Our focus is on reducing the risks from AI that knowingly acts against the wishes of its designers. We focus on relatively neglected work - think Alignment Forum style conceptual work, or empirical work demonstrating the promise of so-far-untested alignment techniques.
1
1
31
@rohinmshah
Rohin Shah
10 months
Really excited that this work is finally out!
@VikrantVarma_
Vikrant Varma
10 months
Our latest paper () provides a general theory explaining when and why grokking (aka delayed generalisation) occurs – a theory so precise that we can predict hyperparameters that lead to partial grokking, and design interventions that reverse grokking! 🧵👇
Tweet media one
Tweet media two
14
201
1K
1
10
29
@rohinmshah
Rohin Shah
5 years
Alignment Newsletter #69 : Stuart Russell's new book on why we need to replace the standard model of AI -
1
7
27
@rohinmshah
Rohin Shah
5 years
Alignment Newsletter #53 - Newsletter turns one year old, and why overfitting isn't a huge problem for neural nets:
0
5
27
@rohinmshah
Rohin Shah
4 months
To estimate impact of various parts of a network on observed behavior, by default you need a few forward passes *per part* -- very expensive. But it turns out you can efficiently approximate this with a few forward passes in total!
@JanosKramar
János Kramár
4 months
New @GoogleDeepmind mech interp research! 🎉 Can we massively speed up the process of finding important nodes in LLMs? Yes! Introducing AtP*, an improved variant of Attribution Patching (AtP) that beats all our baselines on efficiency and effectiveness. 🧵
Tweet media one
3
32
155
1
2
26
@rohinmshah
Rohin Shah
5 years
In competitive games, the minimax theorem allows self-play to be agnostic to its opponent: if they are suboptimal, SP will crush them even harder. That doesn’t work in collaborative games, where the partner’s suboptimal move and SP’s failure to anticipate it will hurt. (2/4)
Tweet media one
2
3
19
@rohinmshah
Rohin Shah
2 years
This failure mode might look familiar. Often a system will perform badly outside of its training distribution! But the reason we care is that the system doesn’t just "break" – it *competently pursues an unintended goal*. This is a recipe for misalignment.
1
3
18
@rohinmshah
Rohin Shah
2 years
BASALT is running again -- and this time there's a pretrained Minecraft model for you to finetune!
@minerl_official
MineRL Project
2 years
Want to leverage the power of large, pre-trained models in tough sequential decision-making tasks? Participate in our @NeurIPSConf competition: The MineRL BASALT Competition on Fine-tuning from Human Feedback! 🎉It starts today! 🎉 Participate here: [1/10]
1
13
65
0
1
18
@rohinmshah
Rohin Shah
2 years
[AN #172 ] Sorry for the long hiatus! I'll restart in the near future; for now have a bunch of news -
0
2
18
@rohinmshah
Rohin Shah
5 years
Alignment Newsletter #47 - Why AI safety needs social scientists:
0
3
17
@rohinmshah
Rohin Shah
5 years
We need an agent that has the right “expectation” about its partner. Obvious solution: train a human model with behavior cloning, and then train an agent to play well with that model. This does way better than SP in simulation (i.e. evaluated against a “test” human model). (3/4)
Tweet media one
1
3
16
@rohinmshah
Rohin Shah
5 years
Alignment Newsletter #50 - How an AI catastrophe could occur, and an overview of AI policy from OpenAI researchers:
0
0
16
@rohinmshah
Rohin Shah
4 years
[Alignment Newsletter #102 ]: Meta learning by GPT-3, and a list of full proposals for AI alignment -
1
3
16
@rohinmshah
Rohin Shah
2 years
[AN #173 ] Recent language model results from DeepMind -
0
1
15
@rohinmshah
Rohin Shah
11 months
There's nothing like delving deep into model internals for a specific behavior for understanding how neural nets are simultaneously extremely structured and extremely messy.
@lieberum_t
Tom Lieberum
11 months
Mech interp has been very successful in tiny models, but does it scale? …Kinda! Our new @GoogleDeepMind paper studies how Chinchilla70B can do multiple-choice Qs, focusing on picking the correct letter. Small model techniques mostly work but it's messy!🧵
Tweet media one
3
43
224
0
1
15
@rohinmshah
Rohin Shah
3 years
Excited that this is finally happening! We've been working for quite a while on this benchmark for learning tasks without well-defined reward functions. Btw, we're looking for additional sponsors; let me know if you have any leads :)
@minerl_official
MineRL Project
3 years
This year, we have a sister competition, BASALT, which has also been accepted to NeurIPS! This competition focuses on training agents on tasks in which human feedback is necessary to specify what should be done. Stay tuned for more details!
2
7
30
0
2
14
@rohinmshah
Rohin Shah
2 months
Rose: The idea is extremely simple and well-motivated, and the effect sizes are large. Thorn: p=0.05 :( (Tbc, I am very confident we would have reached statistical significance for Gated SAEs being more interpretable, if we had a large enough N.)
@sen_r
Senthooran Rajamanoharan
2 months
New @GoogleDeepMind MechInterp work! We introduce Gated SAEs, a Pareto improvement over existing sparse autoencoders. They find equally good reconstructions with around half as many firing features, while maintaining interpretability (CI 0-13% improvement). Joint w/ @ArthurConmy
Tweet media one
5
26
163
0
2
15
@rohinmshah
Rohin Shah
5 years
We developed Reward Learning by Simulating the Past (RLSP), and created a suite of gridworlds to showcase its properties. The top row shows what happens with a misspecified reward, while the bottom shows what happens when using RLSP to correct the reward. (4/4)
1
5
14
@rohinmshah
Rohin Shah
5 years
Real humans adapt to the opaque protocols that SP learns, and play differently than the naive behavior cloned model that our agent was trained against, so the effect is smaller. Nonetheless, the human-aware agent still does better, sometimes beating human performance! (4/4)
Tweet media one
0
4
14
@rohinmshah
Rohin Shah
1 year
@v_maini Definitely the things other people have mentioned: (1) it's opinions of the alignment team in particular, not DM as a whole, and (2) it's optimized for quick-and-dirty communication (a slide deck) rather than a well-written post going into details
0
0
12
@rohinmshah
Rohin Shah
3 years
I’m excited about this benchmark because it gets very close to the situation we face when building AI systems in the real world: there isn’t a clear ground truth, and we need to figure out how to train our agents anyway. (The blog post lists a few more benefits as well.) (7/7)
0
1
12
@rohinmshah
Rohin Shah
2 years
More specifically, the agent learns to pursue a goal that leads to good behavior in training but bad behavior under distribution shift. (What distribution shift did we use in the example above? Check the blog post to find out!)
3
0
12
@rohinmshah
Rohin Shah
2 years
The paper has more examples, an elaboration on how GMG can lead to safety problems, and a discussion of future research directions that could ameliorate GMG. Check it out!
1
1
12
@rohinmshah
Rohin Shah
3 years
Current algorithms act as though their reward was handed down to them by God: they maximize it regardless of whether or not it accurately describes what we want. Instead, we want designers to communicate their preferences to AI agents in an ongoing manner. (2/7)
Tweet media one
2
0
11
@rohinmshah
Rohin Shah
4 years
[Alignment Newsletter #103 ]: ARCHES: an agenda for existential safety, and combining natural language with deep RL -
0
3
11
@rohinmshah
Rohin Shah
4 years
Glad I'm getting at least one subfield, I had to give up on a bunch of others (like interpretability, which is truly massive) 😅
The amazing thing about reading @rohinmshah 's alignment newsletter is every key paper that came out in my small subfield in the last year has already been logged
0
0
2
0
0
11
@rohinmshah
Rohin Shah
4 years
Alignment Newsletter #79 : Recursive reward modeling as an alignment technique integrated with deep RL -
0
2
11
@rohinmshah
Rohin Shah
2 years
Consider (1) an AI system that does what we want, and (2) an AI system that is pursuing some misaligned goal, but knows that we would stop it, and so does what we want until it can make sure we can’t stop it. GMG tells us that it's possible to get (2).
1
2
11
@rohinmshah
Rohin Shah
3 years
[Alignment Newsletter #145 ]: Our three year anniversary! -
0
0
11
@rohinmshah
Rohin Shah
5 years
Alignment Newsletter #57 : Why we should focus on robustness in AI safety, and the analogous problems in programming -
0
1
10
@rohinmshah
Rohin Shah
5 years
Alignment Newsletter #59 : How arguments for AI risk have changed over time -
0
3
10
@rohinmshah
Rohin Shah
2 years
GMG can apply to any kind of learning. Here, Gopher has to few-shot learn from the prompt’s “training examples” that it should clarify unknowns and then compute the answer. It generalizes from 2 unknowns to 1-3 unknowns, but… is oddly insistent… when there are no unknowns:
Tweet media one
1
1
10
@rohinmshah
Rohin Shah
5 years
The key idea is that the state of the world is already optimized for human preferences, and so we can infer those preferences. There are aspects of the environment that are "surprising" and could only have resulted from human effort; we should preserve those aspects. (2/4)
1
1
9
@rohinmshah
Rohin Shah
6 years
Alignment Newsletter #33 - Learning from both demos and preferences, and building a well-motivated AI instead of an AI with the right utility function:
0
2
9
@rohinmshah
Rohin Shah
5 years
Alignment Newsletter #51 - Cancelling within-batch generalization in order to get stable deep RL:
0
1
9
@rohinmshah
Rohin Shah
5 years
Alignment Newsletter #55 - Regulatory markets and international standards as a means of ensuring beneficial AI:
0
1
9
@rohinmshah
Rohin Shah
3 years
[Alignment Newsletter #168 ]: Four technical topics for which Open Phil is soliciting grant proposals -
0
1
8
@rohinmshah
Rohin Shah
3 years
[Alignment Newsletter #153 ]: Experiments that demonstrate failures of objective robustness -
0
0
8
@rohinmshah
Rohin Shah
5 years
Alignment Newsletter #39 - Using GANs for unrestricted adversarial examples:
0
2
8
@rohinmshah
Rohin Shah
5 years
Alignment Newsletter #48 - Quantilization: bounding worst case unintended consequences by partially imitating humans:
0
3
7
@rohinmshah
Rohin Shah
4 years
Alignment Newsletter #80 : Why AI risk might be solved without additional intervention from longtermists -
0
1
7
@rohinmshah
Rohin Shah
4 years
[Alignment Newsletter #127 ]: Rethinking agency: Cartesian frames as a formalization of ways to carve up the world into an agent and its environment -
0
2
7
@rohinmshah
Rohin Shah
5 years
Alignment Newsletter #58 : Mesa optimization: what it is, and why we should care -
1
0
6
@rohinmshah
Rohin Shah
5 years
[Alignment Newsletter #77 ]: Double descent: a unification of statistical theory and modern ML practice -
0
0
7
@rohinmshah
Rohin Shah
3 years
[Alignment Newsletter #152 ]: How we’ve overestimated few-shot learning capabilities -
0
0
7
@rohinmshah
Rohin Shah
2 years
[Alignment Newsletter #171 ]: Disagreements between alignment "optimists" and "pessimists" -
0
0
7
@rohinmshah
Rohin Shah
3 years
[Alignment Newsletter #151 ]: How sparsity in the final layer makes a neural net debuggable -
0
0
7
@rohinmshah
Rohin Shah
3 years
@JeffLadish @EpistemicHope @ESYudkowsky Where "optimism" should be taken relative to Eliezer; I would register as wildly pessimistic relative to the "average" person. I do still work on AI alignment full-time.
0
0
7
@rohinmshah
Rohin Shah
5 years
0
0
7
@rohinmshah
Rohin Shah
3 years
We train a featurization using self-supervised learning on random rollout data, and use a linear reward on top of the features. Gradient ascent pushes the reward to produce behavior similar to what happened in the past. (4/5)
1
1
6
@rohinmshah
Rohin Shah
3 years
🎥 More videos on our website: 💡 Read our blog post: 📑 And the paper for details: 💻 Code: Visit our poster at #ICLR2021 poster session 10, Thursday (May 06) at 08:00 UTC. (5/5)
0
1
6
@rohinmshah
Rohin Shah
2 years
Also consider applying to the Scalable Alignment Team at DeepMind!
@geoffreyirving
Geoffrey Irving
2 years
We are hiring for 4 roles on DeepMind's Scalable Alignment Team, working on AI safety and language models: 1. RS - Machine Learning: 2. RS - Cognitive Science: 3. RE: 4. SWE: (1/n)
9
49
155
1
0
6
@rohinmshah
Rohin Shah
3 years
[Alignment Newsletter #158 ]: Should we be optimistic about generalization? -
0
1
6
@rohinmshah
Rohin Shah
6 years
Alignment Newsletter #21 - What happens at AI Impacts, RL phrased as probabilistic inference, and autonomous AI in Google's data centers:
0
0
6
@rohinmshah
Rohin Shah
3 years
[Alignment Newsletter #167 ]: Concrete ML safety problems and their relevance to x-risk -
0
1
6
@rohinmshah
Rohin Shah
5 years
Alignment Newsletter #38 , in which I arrogantly highlight my own interview. Also how compute affects AI timelines:
0
1
6
@rohinmshah
Rohin Shah
3 years
[Alignment Newsletter #164 ]: How well can language models write code? -
0
0
6
@rohinmshah
Rohin Shah
5 years
In the room environment in the video, it is "surprising" that Alice never broke the vase. Even if she didn't care about the vase, she probably would have broken it at some point. So we can infer that she must care about the vase being intact. (3/4)
1
1
6
@rohinmshah
Rohin Shah
3 years
[AN #130 ]: A new AI x-risk podcast, and reviews of the field -
0
0
6
@rohinmshah
Rohin Shah
4 years
I enjoyed talking about a bunch of different topics with Jeremie Harris of Towards Data Science!
@TDataScience
Towards Data Science
4 years
Effective altruism, AI safety, and learning human preferences from the state of the world with @rohinmshah and @jeremiecharris 🎧
1
2
3
1
2
6
@rohinmshah
Rohin Shah
5 years
Alignment Newsletter #44 - Random search vs. gradient descent on Goodharting, and attention is not all you need; recurrence helps too:
0
2
6
@rohinmshah
Rohin Shah
4 years
[Alignment Newsletter #115 ]: AI safety research problems in the AI-GA framework -
0
2
5
@rohinmshah
Rohin Shah
4 years
[Alignment Newsletter #93 ]: The Precipice we’re standing at, and how we can back away from it -
0
0
5
@rohinmshah
Rohin Shah
3 years
[Alignment Newsletter #160 ]: Building AIs that learn and think like people -
0
2
5
@rohinmshah
Rohin Shah
3 years
To make this realistic, we want tasks where it is challenging to specify a reward function. This means we can’t have automatic evaluation. So, we specify tasks in natural language (“create a waterfall and take a scenic picture of it”), and have humans evaluate performance. (5/7)
Tweet media one
1
0
5
@rohinmshah
Rohin Shah
6 years
Alignment Newsletter #20 - Can curiosity by itself lead to good behavior?
0
0
5
@rohinmshah
Rohin Shah
4 years
[Alignment Newsletter #105 ]: The economic trajectory of humanity, and what we might mean by optimization -
0
0
5
@rohinmshah
Rohin Shah
4 years
[AN #114 ]: Theory-inspired safety solutions for powerful Bayesian RL agents -
0
2
5
@rohinmshah
Rohin Shah
4 years
[Alignment Newsletter #111 ]: The Circuits hypotheses for deep learning -
0
0
4
@rohinmshah
Rohin Shah
5 years
Alignment Newsletter #62 : Are adversarial examples caused by real but imperceptible features? -
0
2
5
@rohinmshah
Rohin Shah
3 years
To simulate the past, we learn an inverse dynamics model that predicts past states, and an inverse policy that predicts past actions. Chaining these together allows our algorithm to simulate the past. (3/5)
Tweet media one
1
1
5
@rohinmshah
Rohin Shah
5 years
Alignment Newsletter #60 : A new AI challenge: Minecraft agents that assist human players in creative mode -
0
2
5
@rohinmshah
Rohin Shah
3 years
How should we evaluate such algorithms? Since they teach the agent which task to do, we need an environment with many possible tasks. But this isn’t true of Atari / MuJoCo. For example, in Pong and Breakout, you hit the ball back, or you die. There are no other options. (3/7)
Tweet media one
1
0
5
@rohinmshah
Rohin Shah
5 years
Alignment Newsletter #61 : AI policy and governance, from two people in the field -
0
3
4
@rohinmshah
Rohin Shah
4 years
[Alignment Newsletter #95 ]: A framework for thinking about how to make AI go well -
0
1
5
@rohinmshah
Rohin Shah
3 years
[Alignment Newsletter #161 ]: Creating generalizable reward functions for multiple tasks by learning a model of functional similarity -
0
2
5
@rohinmshah
Rohin Shah
3 years
Minecraft is perfect for the task: there are a ton of different things to do. Here we can see people beating the Ender dragon, farming peacefully, practicing archery, and looting a bastion remnant. (4/7)
Tweet media one
1
0
5
@rohinmshah
Rohin Shah
3 years
[Alignment Newsletter #144 ]: How language models can also be finetuned for non-language tasks -
0
0
5
@rohinmshah
Rohin Shah
4 years
[Alignment Newsletter #82 ]: How OpenAI Five distributed their training computation -
0
0
5