Jacob Steinhardt @JacobSteinhardt profile

Jacob Steinhardt

@JacobSteinhardt

Followers

9K

Following

185

Media

22

Statuses

412

Assistant Professor of Statistics and EECS, UC Berkeley // Co-founder and CEO, @TransluceAI

Joined December 2011

Don't wanna be here? Send us removal request.

Jacob Steinhardt

@JacobSteinhardt

7 months

In July, I went on leave from UC Berkeley to found @TransluceAI, together with Sarah Schwettmann (@cogconfluence). Now, our work is finally public.

Transluce

@TransluceAI

7 months

Announcing Transluce, a nonprofit research lab building open source, scalable technology for understanding AI systems and steering them in the public interest. Read a letter from the co-founders Jacob Steinhardt and Sarah Schwettmann: .

2

19

348

Jacob Steinhardt

@JacobSteinhardt

5 years

If our international students don't get a salary, I won't either. I pledge to donate my fall salary unless we fix U.S. immigration policy to allow international students (including incoming students) to be paid their stipend.

5

72

1K

Jacob Steinhardt

@JacobSteinhardt

3 years

My student Kayo Yin needs your help. Her visa has been unnecessarily delayed, which would prevent her from coming to UC Berkeley to start her studies. Despite bringing all required documents, the @StateDept refused to process the visa and it could take months to re-process.

25

306

1K

Jacob Steinhardt

@JacobSteinhardt

2 years

Many people, including me, have been surprised by recent developments in machine learning. To be less surprised in the future, we should make and discuss specific projections about future models. In this spirit, I predict properties of models in 2030:

22

116

539

Jacob Steinhardt

@JacobSteinhardt

3 years

In 2021, I created a forecasting prize to predict ML performance on benchmarks in June 2022 (and 2023, 2024, and 2025). June has ended, so we can see how the forecasters did:

5

91

489

Jacob Steinhardt

@JacobSteinhardt

3 years

This NYT article on Azalia and Anna's excellent chip design work is gross, to the point of journalistic malpractice. It platforms a bully while drawing an absurd parallel to @timnitGebru's firing. @CadeMetz should be ashamed. (not linking so it doesn't get more clicks)

16

43

397

Jacob Steinhardt

@JacobSteinhardt

3 years

How fast can you run a transformer model? I spent an unreasonably large amount of time (and space) figuring out the answer:

4

50

400

Jacob Steinhardt

@JacobSteinhardt

1 year

Can we build an LLM system to forecast geo-political events at the level of human forecasters?. Introducing our work Approaching Human-Level Forecasting with Language Models!. Arxiv: Joint work with @dannyhalawi15, @FredZhang0, and @jcyhc_ai

11

68

380

Jacob Steinhardt

@JacobSteinhardt

2 years

A core intuition I have about deep neural networks is that they are complex adaptive systems. This creates a number of control difficulties that are different from traditional engineering challenges:

8

77

325

Jacob Steinhardt

@JacobSteinhardt

2 years

I'm back to blogging, with some new thoughts on emergence: I answer the question: what are some specific emergent "failure modes" for ML systems that we should be on the lookout for?.

5

49

225

Jacob Steinhardt

@JacobSteinhardt

3 years

To give an idea of just how much SOTA exceeded forecasters' expectations, here are the prediction intervals for the MATH and Massive Multitask benchmarks. Both outcomes exceeded the 95th percentile prediction.

6

35

204

Jacob Steinhardt

@JacobSteinhardt

4 months

In 2021, our research group released the MATH dataset. In the paper, we attribute the data to math contests released by the Mathematical Association of America (MAA), which is in the public domain. I've recently become aware that this is mistaken--while MATH contains MAA data. .

4

14

210

Jacob Steinhardt

@JacobSteinhardt

3 years

Awesome to see @DeepMind 's recent language modeling paper include our forecasts as a comparison point! Hopefully more papers track progress relative to forecasts so that we can better understand the pace of progress in deep learning.

1

23

194

Jacob Steinhardt

@JacobSteinhardt

3 years

On my blog, I've recently been discussing emergent behavior and in particular the idea that "More is Different". As part of this, I've compiled a list of examples across a variety of domains:

3

26

188

Jacob Steinhardt

@JacobSteinhardt

3 years

The US Embassy in London must approve Kayo's visa immediately. This is embarrassing and will harm US competitiveness in AI. Please retweet!.

4

23

173

Jacob Steinhardt

@JacobSteinhardt

2 years

Since GPT-4 was released last week, I decided to switch things up from AI-related blogging and instead talk about research group culture. In my group, I've come up with a set of principles to help foster healthy and productive group meetings:

2

18

183

Jacob Steinhardt

@JacobSteinhardt

3 years

Kayo has already done stellar machine learning work for her Master's degree at CMU, one of the top US universities. ML expertise is sorely needed in the US. Is the U.S. really so eager to shoot itself in the foot?.

1

4

164

Jacob Steinhardt

@JacobSteinhardt

2 months

This is an important paper that everyone should read (perhaps most interesting one this year). It provides a trend line for how AI autonomy is increasing over time. My take: results are evidence against general autonomy in next few years, but make it seem more likely in 2029-33.

METR

@METR_Evals

2 months

When will AI systems be able to carry out long projects independently?. In new research, we find a kind of “Moore’s Law for AI agents”: the length of tasks that AIs can do is doubling about every 7 months.

3

16

150

Jacob Steinhardt

@JacobSteinhardt

3 years

Finally, while forecasters underpredicted progress on capabilities, they *overpredicted* progress on robustness. So while capabilities are advancing quickly, safety properties may be behind schedule. A troubling thought.

2

27

143

Jacob Steinhardt

@JacobSteinhardt

3 years

Kayo's semester starts in one week. She's a French citizen who has spent significant time in the U.S. In addition to all required documents, we've sent extensive additional docs to "prove" that Kayo is really coming to Berkeley. There's no reason this can't be approved tomorrow.

2

4

120

Jacob Steinhardt

@JacobSteinhardt

8 months

A new blog post, this time a guest post by my student @ZhongRuiqi . Ruiqi has some very cool work defining a family of statistical models that can include natural language descriptions as part of their parameter space:.

2

22

130

Jacob Steinhardt

@JacobSteinhardt

1 year

To understand future risks to humanity, we should first understand history. I analyzed historical catastrophes, their frequency, and their causes:

6

22

109

Jacob Steinhardt

@JacobSteinhardt

2 years

I worry about tail risks from future AI systems, but I haven't read descriptions that feel plausible to me, so I tried writing some of my own: This led to four vignettes covering cyberattacks, economic competition, and bioterrorism.

4

19

110

Jacob Steinhardt

@JacobSteinhardt

3 years

I've known Anna for a long time now, and she's one of the most impressive junior ML researchers around. She also holds herself to high standards of integrity. I've been impressed with how well she's handled this situation. Let's give her and Azalia our support.

1

5

106

Jacob Steinhardt

@JacobSteinhardt

2 years

ML systems are different from traditional software, in that most of their properties are acquired from data, without explicit human intent. This is unintuitive and creates new types of risk. In this blog post I talk about one such risk: unwanted drives

2

18

93

Jacob Steinhardt

@JacobSteinhardt

3 years

A blog post series on a key way I've changed my mind about ML: the (relative) value of empirical data vs. thought experiments for predicting future ML developments.

1

18

96

Jacob Steinhardt

@JacobSteinhardt

2 years

I quite enjoyed this workshop, and was pretty happy with the talk I gave (new and made ~from scratch!). My topic was using LLMs to help us understand LLMs, and covers great work by @TongPetersb, @ErikJones313, @ZhongRuiqi +others. You can watch it here:

1

16

89

Jacob Steinhardt

@JacobSteinhardt

4 years

New blog post and new blog! We paid professional forecasters to predict AI in 2025; I wrote up what we found.

1

17

84

Jacob Steinhardt

@JacobSteinhardt

3 years

I suspect most of us in the ML field still haven't internalized how quickly ML capabilities are advancing. We should be preregistering forecasts so that we can learn and correct! I intend to do so for June 2023.

1

6

81

Jacob Steinhardt

@JacobSteinhardt

3 years

Findings:. * Forecasters significantly underpredicted progress. * But were more accurate than me (I underpredicted progress even more!). * Also were (probably) more accurate than median ML researcher.

1

8

79

Jacob Steinhardt

@JacobSteinhardt

3 months

New company founded by people who I like. Good to see the focus on openness and transparency --- this will help the scientific community and public better understand the behavior and implications of AI.

Thinking Machines

@thinkymachines

3 months

Today, we are excited to announce Thinking Machines Lab (, an artificial intelligence research and product company. We are scientists, engineers, and builders behind some of the most widely used AI products and libraries, including ChatGPT,.

1

0

82

Jacob Steinhardt

@JacobSteinhardt

6 months

Transluce is building open and scalable tech addressing some of the biggest questions in AI: how can we understand and predict the behavior of AI systems, and know when they’re safe to deploy? . Want to chat at NeurIPS? RSVP here:

Transluce

@TransluceAI

6 months

Transluce will be at #NeurIPS2024!. Who’s coming to lunch on Thursday to meet the team and learn about open problems we're working on? .Space is limited, RSVP soon.

0

5

72

Jacob Steinhardt

@JacobSteinhardt

2 years

Over the past two years, I and many other forecasters registered predictions about the state-of-the-art accuracy on ML benchmarks in 2022-2025. In this blog post, I evaluate the predictions for 2023:

3

13

73

Jacob Steinhardt

@JacobSteinhardt

3 years

Satrajit Chatterjee (the subject of the article) is portrayed as being fired after raising scientific concerns with Azalia Mirhoseini and Anna Goldie's Nature paper on chip design. In reality, Chatterjee waged a years-long campaign to harass & undermine their work.

1

69

Jacob Steinhardt

@JacobSteinhardt

5 years

Gov. Cuomo recently said that he's using R = 1.1 as a trigger point for “circuit breaking” New York’s reopening. This is a weird policy that doesn't make sense, but not because we should use R = 1 instead. 1/N.

3

12

59

Jacob Steinhardt

@JacobSteinhardt

3 years

Google's statement says Chatterjee was "terminated with cause". This is an unusually strong statement and shows Google had serious problems with him. NYT should know this so it's unclear why they paint this as "he said she said" (and give most space to Chatterjee).

1

50

Jacob Steinhardt

@JacobSteinhardt

3 years

I argue that while ML models have undergone many qualitative shifts (and will continue to do so), many empirical findings hold up well even across these shifts: Part of the "More is Different" series on my blog!.

1

5

48

Jacob Steinhardt

@JacobSteinhardt

3 years

Interestingly, forecasters' biggest miss was on the MATH dataset, where @alewkowycz @ethansdyer and others set a record of 50.3% on the very last day of June! One day made a huge difference.

2

6

48

Jacob Steinhardt

@JacobSteinhardt

5 years

New paper on household transmission of SARS-CoV-2: with @mihaela_curmei, @andrew_ilyas, and @OwainEvans_UK. Very interested in feedback! We show that under lockdowns, 30-55% of transmissions occur in houses. 1/4.

2

13

49

Jacob Steinhardt

@JacobSteinhardt

2 years

My tutorial slides on Aligning ML Systems are now online, in HTML format, with clickable references! [NB some minor formatting errors were introduced when converting to HTML].

0

7

45

Jacob Steinhardt

@JacobSteinhardt

3 years

It's particularly gross that the article repeatedly draws parallels with Timnit Gebru's firing, which is completely different in terms of the facts on the ground. Timnit agrees: Seems clear that NYT did this for clicks.

@timnitGebru (@dair-community.social/bsky.social)

@timnitGebru

3 years

I haven't read this @nytimes article by @daiwaka & @CadeMetz. But I had heard about the person from many ppl. To the extent the story is connected to mine, it's ONLY the pattern of action on toxic men taken too late while ppl like me are retaliated against.

2

41

Jacob Steinhardt

@JacobSteinhardt

3 years

Of course, NYT's in the business of clicks. But they should draw the line when giving a bully a platform to continue to harass two junior researchers.

1

2

38

Jacob Steinhardt

@JacobSteinhardt

1 year

Nora is a super creative thinker and very capable engineer. I'd highly recommend working for her if you want to do cool work on understanding ML models at an open-source org!.

Nora Belrose

@norabelrose

1 year

My Interpretability research team at @AiEleuther is hiring! If you're interested, please read our job posting and submit:.1. Your CV.2. Three interp papers you'd like to build on.3. Links to cool open source repos you've built.to contact@eleuther.ai

5

0

38

Jacob Steinhardt

@JacobSteinhardt

2 years

Some nice pushback on my GPT-2030 post by @xuanalogue, with lots of links!.

xuan (ɕɥɛn / sh-yen)

@xuanalogue

2 years

I respect Jacob a lot but I find it really difficult to engage with predictions of LLM capabilities that presume some version of the scaling hypothesis will continue to hold - it just seems highly implausible given everything we already know about the limits of transformers!.

2

36

Jacob Steinhardt

@JacobSteinhardt

4 months

. the majority is from Art of Problem Solving (AoPS)'s Alcumus platform, which contains problems written by AoPS, MATHCOUNTS, & other orgs/individuals. I was not aware of this at time of publication, but as senior author I should have checked, and I take full responsibility.

1

0

36

Jacob Steinhardt

@JacobSteinhardt

4 years

Is remote work slower? I estimate 0-50% slower for many tasks, but for some tasks (esp. branching into new areas/skillsets) it can easily be 5x slower. Easy to underestimate for managers, but huge effect:

2

32

Jacob Steinhardt

@JacobSteinhardt

2 years

Complex adaptive systems follow the law of unintended consequences: straightforward attempts to control traffic, ecosystems, firms, or pathogens fail in unexpected ways. And we can see similar issues in deep networks with reward hacking and emergence.

1

2

30

Jacob Steinhardt

@JacobSteinhardt

2 years

In particular, I project that "GPT-2030" will have a number of properties that are surprising relative to current systems:.1. Superhuman abilities at specific tasks, such as math, programming, and hacking. 2. Fast inference speed and throughput (enough to run millions of copies).

3

5

31

Jacob Steinhardt

@JacobSteinhardt

2 years

4. Consider not building certain systems. In biology, some gain-of-function research is heavily restricted, and there are significant safeguards around rapidly-evolving systems like pathogens. We should ask if and when similar principles should apply in machine learning.

1

26

Jacob Steinhardt

@JacobSteinhardt

4 months

Why is this important? First, scientifically, it is important to correctly attribute the provenance of data---among other reasons, to know which data should be kept separate from model training.

1

0

25

Jacob Steinhardt

@JacobSteinhardt

2 years

Based on this, I examine a number of principles for improving the safety of deep learning systems that are inspired by the complex systems literature:. 1. Build sharp cliffs in the reward landscape around bad behaviors, so that models never explore them in the first place.

1

2

25

Jacob Steinhardt

@JacobSteinhardt

2 years

@EpochAIResearch is one of the coolest (and in my opinion underrated) research orgs for understanding trends in ML. Rather than speculating, they meticulously analyze empirical trends and make projections for the future. Lots of interesting findings in their data!.

Pablo Villalobos 🔸

@pvllss

3 years

We at @EpochAIResearch recently published a new short report!. In "Trends in Training Dataset Sizes", we explore the growth of ML training datasets over the past few decades. Doubling time has historically been 16 months for language datasets and 41 months for vision. 🧵1/3

0

4

24

Jacob Steinhardt

@JacobSteinhardt

2 years

For much more detail on all of this (and more), please read the post!

2

0

24

Jacob Steinhardt

@JacobSteinhardt

2 years

2. Train models to self-regulate and have limited aims. 3. Pretraining shapes most of the structure of a model. Consider what heuristics you are baking in at pretraining time, rather than relying on fine-tuning to fix problems.

1

24

Jacob Steinhardt

@JacobSteinhardt

2 years

I've previously made forecasts for mid-2023 (which I'll discuss in July once they resolve). Thinking 7 years out is obviously much harder, but I think important for preparing for the future impacts of ML.

2

0

24

Jacob Steinhardt

@JacobSteinhardt

4 years

Many have heard of deliberate practice, but I identify another importance mental stance called *deliberate play*. Deliberate play is intentional, but with a softer focus. Deliberate practice develops skills; deliberate play develops frameworks.

0

3

23

Jacob Steinhardt

@JacobSteinhardt

3 years

What will SOTA for ML benchmarks be in 2023? I forecast results for the MATH and MMLU benchmarks, two benchmarks that have had surprising progress in the past year:

1

5

21

Jacob Steinhardt

@JacobSteinhardt

3 years

In the next post of this series, I argue that when predicting the future of ML, we should not simply expect existing empirical trends to continue. Instead, we will often observe qualitatively new, "emergent" behavior:

Jacob Steinhardt

@JacobSteinhardt

3 years

A blog post series on a key way I've changed my mind about ML: the (relative) value of empirical data vs. thought experiments for predicting future ML developments.

0

2

21

Jacob Steinhardt

@JacobSteinhardt

2 years

3. Parallel learning. Because copies have identical weights, can propagate millions of gradient updates in parallel. This means models could rapidly learn new tasks (including "bad" tasks like manipulation/misinformation).

1

21

Jacob Steinhardt

@JacobSteinhardt

2 years

4. New modalities. Beyond tool use and images, may be trained on proteins, astronomical images, networks, etc. Therefore could have strong intuitive grasp of these more "exotic" domains.

2

20

Jacob Steinhardt

@JacobSteinhardt

4 months

More importantly, while MAA data is public domain, the data from Alcumus is not--it belongs to Art of Problem Solving, MATHCOUNTS, and others.

1

0

19

Jacob Steinhardt

@JacobSteinhardt

4 months

. perhaps by having key orgs in the ML community chip in to compensate AoPS, MATHCOUNTS, and others that contributed to MATH for the amazing data they have created and we have all benefited from.

1

0

19

Jacob Steinhardt

@JacobSteinhardt

2 years

I elaborate on these and consider several additional ideas in the blog post itself. Thanks to @DanHendrycks for first articulating the complex systems perspective on deep learning to me. He's continuing to do great work in that and other directions at

0

18

Jacob Steinhardt

@JacobSteinhardt

4 months

Since MATH leaks problems and solutions from Alcumus and elsewhere on the AoPS website, for now I've taken it off Berkeley's web server. I'm currently engaged in friendly conversations with AoPS, and my hope is we will find a long-term solution that allows MATH to be re-hosted. .

1

0

18

Jacob Steinhardt

@JacobSteinhardt

3 years

For predicting what future ML systems will look like, it's helpful to have "anchors"---reference classes that are broadly analogous to future ML. Common anchors include "current ML" and "humans", but I think there's many other good choices:

2

3

17

Jacob Steinhardt

@JacobSteinhardt

2 years

I then consider a few ways GPT-2030 could affect society. Importantly, there are serious misuse risks (such as hacking and persuasion) that we should address. These are just two examples, and generally I favor more work on forward-looking analyses of societal impacts.

4

1

16

Jacob Steinhardt

@JacobSteinhardt

1 year

In this work, we build a LM pipeline for automated forecasting. Given any question about a future event, it retrieves and summarizes relevant articles, reasons about them, and predicts the probability that the event occurs.

2

0

16

Jacob Steinhardt

@JacobSteinhardt

4 months

AoPS is an incredible org that I benefited from as a high school student. They create best-in-world materials for learning mathematical problem solving, including those in Alcumus. They and the other contributing orgs deserve appropriate credit and compensation for their work.

1

16

Jacob Steinhardt

@JacobSteinhardt

3 years

If you want to join me on this, you can register predictions on Metaculus for the MATH and Massive Multitask benchmarks:. * * It's pretty easy--just need a Google account. The MATH one is open now and Multitask should be open soon.

Jacob Steinhardt

@JacobSteinhardt

3 years

I suspect most of us in the ML field still haven't internalized how quickly ML capabilities are advancing. We should be preregistering forecasts so that we can learn and correct! I intend to do so for June 2023.

3

4

16

Jacob Steinhardt

@JacobSteinhardt

3 years

@aghobarah Definitely agree in terms of research track record. But in terms of professional standing, Anna's a PhD student and Azalia's on the academic job market right now. This is important, because it means their careers are more affected by this sort of press (vs. a tenured prof).

0

15

Jacob Steinhardt

@JacobSteinhardt

5 years

If you’re interested in this, @andrew_ilyas. and I have a working paper discussing these issues in more detail:

0

1

14

Jacob Steinhardt

@JacobSteinhardt

5 years

@chhaviyadav_ Consulates are closed due to COVID-19, so incoming international students can't apply for visas. Has been true for a while but now at the point it is affecting students directly. See e.g. this June letter from GOP representatives asking Pomep to fix it:

1

14

Jacob Steinhardt

@JacobSteinhardt

3 years

For those interested in the original forecasts, you can read our blog post here:

1

0

15

Jacob Steinhardt

@JacobSteinhardt

5 years

Some exciting new work by my student @DanHendrycks and collaborators. We identify seven hypotheses about OOD generalization in the literature, and collect several new datasets to test these. Trying to add more "strong inference" to ML (cf. Platt 1964).

Dan Hendrycks

@DanHendrycks

5 years

What methods actually improve robustness? In this paper, we test robustness to changes in geography, time, occlusion, rendition, real image blurs, and so on with 4 new datasets. No published method consistently improves robustness.

0

1

13

Jacob Steinhardt

@JacobSteinhardt

5 years

Curated list of documented police abuse during protests: Compilations like this are a compelling reminder that George Floyd is the most salient instance of a broader trend. (And remember: there's also many good police who are supporting protestors.).

0

1

14

Jacob Steinhardt

@JacobSteinhardt

4 months

I'm optimistic we can resolve this in a way that ends well for everyone, and am working towards that. For now, I'd request that anyone who is hosting a mirror of MATH also take it down until further notice; and for anyone who uses MATH in their work to credit the problem writers:.

1

0

13

Jacob Steinhardt

@JacobSteinhardt

4 months

AoPS & the AoPS Community, MATHCOUNTS, the MAA, the Centre for Education in Mathematics and Computing, the Harvard-MIT Math Tournament, the Math Prize for Girls, MOEMS, the Mandelbrot Competition, and the Institute of Mathematics and Applications.

1

0

12

Jacob Steinhardt

@JacobSteinhardt

5 years

Good to see this analysis, but misleading headline. 24 states have *point estimates* over 1, but uncertainty in estimates is large. Let's consider null hypothesis that Rt=0.95 everywhere. Then would expect 19 states with estimates above 1 (eyeballing stdev=0.17 from fig. 4).

MRC Centre for Global Infectious Disease Analysis

@MRC_Outbreak

5 years

UPDATE: #covid19science #COVID19 in USA. ➡️Initial national average reproduction number R was 2.2.➡️24 states have Rt over 1.➡️Increasing mobility cause resurgence (doubling number of deaths in 8 weeks).➡️4.1% of people infected nationally . 🔰Report

1

13

Jacob Steinhardt

@JacobSteinhardt

1 year

Moreover, averaging our prediction with the crowd consistently outperforms the crowd itself (as measured by Brier score, the most commonly-used metric of forecasting performance).

1

13

Jacob Steinhardt

@JacobSteinhardt

1 year

We compare our system to ensembles of competitive human forecasters ("the crowd"). We approach the performance of the crowd across all questions, and beat the crowd on questions where they are less confident (probabilities between 0.3 and 0.7).

2

1

13

Jacob Steinhardt

@JacobSteinhardt

5 years

The actual issue is that R is not a good metric for directly setting policy, because it's difficult to estimate and far-removed from things in the world we care about, like hospital demand.

1

9

Jacob Steinhardt

@JacobSteinhardt

4 months

(There may be a few other small sources; these are the ones that AoPS has confirmed so far.) We will hopefully have more updates soon!.

0

12

Jacob Steinhardt

@JacobSteinhardt

1 year

Our system has a number of interesting properties. For instance, our forecasted probabilities are well-calibrated, even though we perform no explicit calibration and even though the base models themselves are not (!).

1

12

Jacob Steinhardt

@JacobSteinhardt

5 years

Lots of people hating on hydroxychloroquine because Trump likes it. But just because Trump likes something doesn't mean it kills people. Maybe it does, but let's demand real evidence instead of giving shoddy science a pass.

1

12

Jacob Steinhardt

@JacobSteinhardt

5 years

What's going on with Georgia? They've been "open" for a while now and there's been no apparent spike in cases. I don't think this can just be poor testing because other data sources (e.g. FB surveys) show same thing: 1/5.

2

0

12

Jacob Steinhardt

@JacobSteinhardt

2 years

I *also* still think there are unknown unknowns, and we should probably slow down and understand what current large ML systems are doing, before rushing to deploy new ones. But hopefully concrete behaviors will open the door to concrete research towards addressing them.

1

2

11

Jacob Steinhardt

@JacobSteinhardt

3 years

Examples include gecko feet, operating systems, economic specialization, hemoglobin, polymers, eyes, ant colonies, transistors, cities, and skill acquisition. If you're interested in reading about how this applies to ML, check out the full blog series!

1

2

11

Jacob Steinhardt

@JacobSteinhardt

2 years

@xuanalogue Only thing missing is a counter-prediction so we can compare in 7 years :).

1

0

10

Jacob Steinhardt

@JacobSteinhardt

2 years

Overall, each scenario requires a few things to "go right" for the rogue AI system; I think of them as moderate but not extreme tail events, and assign ~5% probability to "something like" one of these scenarios happening by 2050. (w/ additional prob. on other/unknown scenarios).

2

1

10

Jacob Steinhardt

@JacobSteinhardt

3 years

And the DeepMind paper itself:

1

0

10

Jacob Steinhardt

@JacobSteinhardt

2 years

In research, it's important to create an environment that allows for risk-taking and mistakes, while also pushing eventually towards excellence and innovation. I aim to set discussion norms that promote both of these.

1

0

10

Jacob Steinhardt

@JacobSteinhardt

5 years

Some great recommendations from Chloe Cockburn (a program officer at Open Philanthropy, where I worked last summer). My understanding is that DA elections (starts at #9 on the list) are a high-impact route to police and criminal justice reform.

0

10

Jacob Steinhardt

@JacobSteinhardt

7 months

@michael_c_grant @mengk20 @TransluceAI That as our first hypothesis, but getting rid of software versions only helps a little bit! 1% vs. 22% increase across a dataset of examples. (There's many more bible verses than software versions in the training data.). See our write-up for details!

1

10

Jacob Steinhardt

@JacobSteinhardt

1 year

Second, our model underperforms on "easy" questions (where the answer is nearly certain), because it is unwilling to give probabilities very close to 0 or 1. This is possibly an artifact of its safety training.

1

10

Jacob Steinhardt

@JacobSteinhardt

8 months

I think this framework is very powerful for finding explainable patterns in large datasets, and I've already begun to use it for explainability challenges I'm facing in other projects. I'd encourage you to check out the blog post, as well as the paper:

Ruiqi Zhong

@ZhongRuiqi

8 months

Paper: Explaining Datasets in Words: Statistical Models with Natural Language Parameters . Link: . Code (try it out!): Blog post: joint work with @HengWang_xjtu, Dan Klein, and @JacobSteinhardt.

0

8

Jacob Steinhardt

@JacobSteinhardt

1 year

For some cool related work, see which examines human-LLM forecasting teams, and and which introduce AI forecasting competitions.

1

0

9

Jacob Steinhardt

@JacobSteinhardt

1 year

Signal-boosting this pushback since Nuño has a strong forecasting track record. I agree AI part is not traditional ref. class analysis, but think "AI is an adaptive self-replicator, this often causes problems" is importantly less inside-view than [long arg. about paperclips].

Nuño Sempere in London 9-16/May

@NunoSempere

1 year

@JacobSteinhardt @DhruvMadeka I like the overall analysis. I think that the move of noticing that AIs might share some characteristics with pandemics, in that AIs might be self-replicating, is an inside-view move, and I don't feel great about characterizing that as a reference class analysis.

1

0

9

Jacob Steinhardt

@JacobSteinhardt

3 years

Interesting opportunity to do mechanistic interpretability research! (I have worked/collaborated with Redwood and enjoyed it.).

Neel Nanda

@NeelNanda5

3 years

I'm helping Redwood Research run REMIX, a 1 month mechanistic interpretability sprint where 25+ people to reverse engineer circuits in GPT-2 Small. This seems a great way to get experience exploring @ch402's transformer circuits work. Apply by 13th Nov!.

0

9

Jacob Steinhardt

@JacobSteinhardt

5 years

Open letter on police reform at UC Berkeley. I helped draft this, together with several amazing students. If you're at UCB and want to sign, please get in touch via e-mail. UCB has already pursued some good reforms, but there's much more to be done.

0

6

Jacob Steinhardt

@JacobSteinhardt

1 year

We are excited to continue this work! Please email @dannyhalawi15 at dannyhalawi15@gmail.com to get in touch.

3

0

9