Jacob Steinhardt Profile
Jacob Steinhardt

@JacobSteinhardt

Followers
9K
Following
185
Media
22
Statuses
412

Assistant Professor of Statistics and EECS, UC Berkeley // Co-founder and CEO, @TransluceAI

Joined December 2011
Don't wanna be here? Send us removal request.
@JacobSteinhardt
Jacob Steinhardt
7 months
In July, I went on leave from UC Berkeley to found @TransluceAI, together with Sarah Schwettmann (@cogconfluence). Now, our work is finally public.
@TransluceAI
Transluce
7 months
Announcing Transluce, a nonprofit research lab building open source, scalable technology for understanding AI systems and steering them in the public interest. Read a letter from the co-founders Jacob Steinhardt and Sarah Schwettmann: .
2
19
348
@JacobSteinhardt
Jacob Steinhardt
5 years
If our international students don't get a salary, I won't either. I pledge to donate my fall salary unless we fix U.S. immigration policy to allow international students (including incoming students) to be paid their stipend.
5
72
1K
@JacobSteinhardt
Jacob Steinhardt
3 years
My student Kayo Yin needs your help. Her visa has been unnecessarily delayed, which would prevent her from coming to UC Berkeley to start her studies. Despite bringing all required documents, the @StateDept refused to process the visa and it could take months to re-process.
25
306
1K
@JacobSteinhardt
Jacob Steinhardt
2 years
Many people, including me, have been surprised by recent developments in machine learning. To be less surprised in the future, we should make and discuss specific projections about future models. In this spirit, I predict properties of models in 2030:
22
116
539
@JacobSteinhardt
Jacob Steinhardt
3 years
In 2021, I created a forecasting prize to predict ML performance on benchmarks in June 2022 (and 2023, 2024, and 2025). June has ended, so we can see how the forecasters did:
5
91
489
@JacobSteinhardt
Jacob Steinhardt
3 years
This NYT article on Azalia and Anna's excellent chip design work is gross, to the point of journalistic malpractice. It platforms a bully while drawing an absurd parallel to @timnitGebru's firing. @CadeMetz should be ashamed. (not linking so it doesn't get more clicks)
Tweet media one
16
43
397
@JacobSteinhardt
Jacob Steinhardt
3 years
How fast can you run a transformer model? I spent an unreasonably large amount of time (and space) figuring out the answer:
4
50
400
@JacobSteinhardt
Jacob Steinhardt
1 year
Can we build an LLM system to forecast geo-political events at the level of human forecasters?. Introducing our work Approaching Human-Level Forecasting with Language Models!. Arxiv: Joint work with @dannyhalawi15, @FredZhang0, and @jcyhc_ai
Tweet media one
11
68
380
@JacobSteinhardt
Jacob Steinhardt
2 years
A core intuition I have about deep neural networks is that they are complex adaptive systems. This creates a number of control difficulties that are different from traditional engineering challenges:
8
77
325
@JacobSteinhardt
Jacob Steinhardt
2 years
I'm back to blogging, with some new thoughts on emergence: I answer the question: what are some specific emergent "failure modes" for ML systems that we should be on the lookout for?.
5
49
225
@JacobSteinhardt
Jacob Steinhardt
3 years
To give an idea of just how much SOTA exceeded forecasters' expectations, here are the prediction intervals for the MATH and Massive Multitask benchmarks. Both outcomes exceeded the 95th percentile prediction.
Tweet media one
6
35
204
@JacobSteinhardt
Jacob Steinhardt
4 months
In 2021, our research group released the MATH dataset. In the paper, we attribute the data to math contests released by the Mathematical Association of America (MAA), which is in the public domain. I've recently become aware that this is mistaken--while MATH contains MAA data. .
4
14
210
@JacobSteinhardt
Jacob Steinhardt
3 years
Awesome to see @DeepMind 's recent language modeling paper include our forecasts as a comparison point! Hopefully more papers track progress relative to forecasts so that we can better understand the pace of progress in deep learning.
Tweet media one
1
23
194
@JacobSteinhardt
Jacob Steinhardt
3 years
On my blog, I've recently been discussing emergent behavior and in particular the idea that "More is Different". As part of this, I've compiled a list of examples across a variety of domains:
3
26
188
@JacobSteinhardt
Jacob Steinhardt
3 years
The US Embassy in London must approve Kayo's visa immediately. This is embarrassing and will harm US competitiveness in AI. Please retweet!.
4
23
173
@JacobSteinhardt
Jacob Steinhardt
2 years
Since GPT-4 was released last week, I decided to switch things up from AI-related blogging and instead talk about research group culture. In my group, I've come up with a set of principles to help foster healthy and productive group meetings:
2
18
183
@JacobSteinhardt
Jacob Steinhardt
3 years
Kayo has already done stellar machine learning work for her Master's degree at CMU, one of the top US universities. ML expertise is sorely needed in the US. Is the U.S. really so eager to shoot itself in the foot?.
1
4
164
@JacobSteinhardt
Jacob Steinhardt
2 months
This is an important paper that everyone should read (perhaps most interesting one this year). It provides a trend line for how AI autonomy is increasing over time. My take: results are evidence against general autonomy in next few years, but make it seem more likely in 2029-33.
@METR_Evals
METR
2 months
When will AI systems be able to carry out long projects independently?. In new research, we find a kind of “Moore’s Law for AI agents”: the length of tasks that AIs can do is doubling about every 7 months.
Tweet media one
3
16
150
@JacobSteinhardt
Jacob Steinhardt
3 years
Finally, while forecasters underpredicted progress on capabilities, they *overpredicted* progress on robustness. So while capabilities are advancing quickly, safety properties may be behind schedule. A troubling thought.
2
27
143
@JacobSteinhardt
Jacob Steinhardt
3 years
Kayo's semester starts in one week. She's a French citizen who has spent significant time in the U.S. In addition to all required documents, we've sent extensive additional docs to "prove" that Kayo is really coming to Berkeley. There's no reason this can't be approved tomorrow.
2
4
120
@JacobSteinhardt
Jacob Steinhardt
8 months
A new blog post, this time a guest post by my student @ZhongRuiqi . Ruiqi has some very cool work defining a family of statistical models that can include natural language descriptions as part of their parameter space:.
2
22
130
@JacobSteinhardt
Jacob Steinhardt
1 year
To understand future risks to humanity, we should first understand history. I analyzed historical catastrophes, their frequency, and their causes:
6
22
109
@JacobSteinhardt
Jacob Steinhardt
2 years
I worry about tail risks from future AI systems, but I haven't read descriptions that feel plausible to me, so I tried writing some of my own: This led to four vignettes covering cyberattacks, economic competition, and bioterrorism.
4
19
110
@JacobSteinhardt
Jacob Steinhardt
3 years
I've known Anna for a long time now, and she's one of the most impressive junior ML researchers around. She also holds herself to high standards of integrity. I've been impressed with how well she's handled this situation. Let's give her and Azalia our support.
1
5
106
@JacobSteinhardt
Jacob Steinhardt
2 years
ML systems are different from traditional software, in that most of their properties are acquired from data, without explicit human intent. This is unintuitive and creates new types of risk. In this blog post I talk about one such risk: unwanted drives
2
18
93
@JacobSteinhardt
Jacob Steinhardt
3 years
A blog post series on a key way I've changed my mind about ML: the (relative) value of empirical data vs. thought experiments for predicting future ML developments.
1
18
96
@JacobSteinhardt
Jacob Steinhardt
2 years
I quite enjoyed this workshop, and was pretty happy with the talk I gave (new and made ~from scratch!). My topic was using LLMs to help us understand LLMs, and covers great work by @TongPetersb, @ErikJones313, @ZhongRuiqi +others. You can watch it here:
1
16
89
@JacobSteinhardt
Jacob Steinhardt
4 years
New blog post and new blog! We paid professional forecasters to predict AI in 2025; I wrote up what we found.
1
17
84
@JacobSteinhardt
Jacob Steinhardt
3 years
I suspect most of us in the ML field still haven't internalized how quickly ML capabilities are advancing. We should be preregistering forecasts so that we can learn and correct! I intend to do so for June 2023.
1
6
81
@JacobSteinhardt
Jacob Steinhardt
3 years
Findings:. * Forecasters significantly underpredicted progress. * But were more accurate than me (I underpredicted progress even more!). * Also were (probably) more accurate than median ML researcher.
1
8
79
@JacobSteinhardt
Jacob Steinhardt
3 months
New company founded by people who I like. Good to see the focus on openness and transparency --- this will help the scientific community and public better understand the behavior and implications of AI.
@thinkymachines
Thinking Machines
3 months
Today, we are excited to announce Thinking Machines Lab (, an artificial intelligence research and product company. We are scientists, engineers, and builders behind some of the most widely used AI products and libraries, including ChatGPT,.
1
0
82
@JacobSteinhardt
Jacob Steinhardt
6 months
Transluce is building open and scalable tech addressing some of the biggest questions in AI: how can we understand and predict the behavior of AI systems, and know when they’re safe to deploy? . Want to chat at NeurIPS? RSVP here:
@TransluceAI
Transluce
6 months
Transluce will be at #NeurIPS2024!. Who’s coming to lunch on Thursday to meet the team and learn about open problems we're working on? .Space is limited, RSVP soon.
0
5
72
@JacobSteinhardt
Jacob Steinhardt
2 years
Over the past two years, I and many other forecasters registered predictions about the state-of-the-art accuracy on ML benchmarks in 2022-2025. In this blog post, I evaluate the predictions for 2023:
3
13
73
@JacobSteinhardt
Jacob Steinhardt
3 years
Satrajit Chatterjee (the subject of the article) is portrayed as being fired after raising scientific concerns with Azalia Mirhoseini and Anna Goldie's Nature paper on chip design. In reality, Chatterjee waged a years-long campaign to harass & undermine their work.
1
1
69
@JacobSteinhardt
Jacob Steinhardt
5 years
Gov. Cuomo recently said that he's using R = 1.1 as a trigger point for “circuit breaking” New York’s reopening. This is a weird policy that doesn't make sense, but not because we should use R = 1 instead. 1/N.
3
12
59
@JacobSteinhardt
Jacob Steinhardt
3 years
Google's statement says Chatterjee was "terminated with cause". This is an unusually strong statement and shows Google had serious problems with him. NYT should know this so it's unclear why they paint this as "he said she said" (and give most space to Chatterjee).
Tweet media one
1
1
50
@JacobSteinhardt
Jacob Steinhardt
3 years
I argue that while ML models have undergone many qualitative shifts (and will continue to do so), many empirical findings hold up well even across these shifts: Part of the "More is Different" series on my blog!.
1
5
48
@JacobSteinhardt
Jacob Steinhardt
3 years
Interestingly, forecasters' biggest miss was on the MATH dataset, where @alewkowycz @ethansdyer and others set a record of 50.3% on the very last day of June! One day made a huge difference.
2
6
48
@JacobSteinhardt
Jacob Steinhardt
5 years
New paper on household transmission of SARS-CoV-2: with @mihaela_curmei, @andrew_ilyas, and @OwainEvans_UK. Very interested in feedback! We show that under lockdowns, 30-55% of transmissions occur in houses. 1/4.
Tweet media one
2
13
49
@JacobSteinhardt
Jacob Steinhardt
2 years
My tutorial slides on Aligning ML Systems are now online, in HTML format, with clickable references! [NB some minor formatting errors were introduced when converting to HTML].
0
7
45
@JacobSteinhardt
Jacob Steinhardt
3 years
It's particularly gross that the article repeatedly draws parallels with Timnit Gebru's firing, which is completely different in terms of the facts on the ground. Timnit agrees: Seems clear that NYT did this for clicks.
@timnitGebru
@timnitGebru (@dair-community.social/bsky.social)
3 years
I haven't read this @nytimes article by @daiwaka & @CadeMetz. But I had heard about the person from many ppl. To the extent the story is connected to mine, it's ONLY the pattern of action on toxic men taken too late while ppl like me are retaliated against.
2
2
41
@JacobSteinhardt
Jacob Steinhardt
3 years
Of course, NYT's in the business of clicks. But they should draw the line when giving a bully a platform to continue to harass two junior researchers.
1
2
38
@JacobSteinhardt
Jacob Steinhardt
1 year
Nora is a super creative thinker and very capable engineer. I'd highly recommend working for her if you want to do cool work on understanding ML models at an open-source org!.
@norabelrose
Nora Belrose
1 year
My Interpretability research team at @AiEleuther is hiring! If you're interested, please read our job posting and submit:.1. Your CV.2. Three interp papers you'd like to build on.3. Links to cool open source repos you've built.to contact@eleuther.ai
5
0
38
@JacobSteinhardt
Jacob Steinhardt
2 years
Some nice pushback on my GPT-2030 post by @xuanalogue, with lots of links!.
@xuanalogue
xuan (ɕɥɛn / sh-yen)
2 years
I respect Jacob a lot but I find it really difficult to engage with predictions of LLM capabilities that presume some version of the scaling hypothesis will continue to hold - it just seems highly implausible given everything we already know about the limits of transformers!.
2
2
36
@JacobSteinhardt
Jacob Steinhardt
4 months
. the majority is from Art of Problem Solving (AoPS)'s Alcumus platform, which contains problems written by AoPS, MATHCOUNTS, & other orgs/individuals. I was not aware of this at time of publication, but as senior author I should have checked, and I take full responsibility.
1
0
36
@JacobSteinhardt
Jacob Steinhardt
4 years
Is remote work slower? I estimate 0-50% slower for many tasks, but for some tasks (esp. branching into new areas/skillsets) it can easily be 5x slower. Easy to underestimate for managers, but huge effect:
2
2
32
@JacobSteinhardt
Jacob Steinhardt
2 years
Complex adaptive systems follow the law of unintended consequences: straightforward attempts to control traffic, ecosystems, firms, or pathogens fail in unexpected ways. And we can see similar issues in deep networks with reward hacking and emergence.
1
2
30
@JacobSteinhardt
Jacob Steinhardt
2 years
In particular, I project that "GPT-2030" will have a number of properties that are surprising relative to current systems:.1. Superhuman abilities at specific tasks, such as math, programming, and hacking. 2. Fast inference speed and throughput (enough to run millions of copies).
3
5
31
@JacobSteinhardt
Jacob Steinhardt
2 years
4. Consider not building certain systems. In biology, some gain-of-function research is heavily restricted, and there are significant safeguards around rapidly-evolving systems like pathogens. We should ask if and when similar principles should apply in machine learning.
1
1
26
@JacobSteinhardt
Jacob Steinhardt
4 months
Why is this important? First, scientifically, it is important to correctly attribute the provenance of data---among other reasons, to know which data should be kept separate from model training.
1
0
25
@JacobSteinhardt
Jacob Steinhardt
2 years
Based on this, I examine a number of principles for improving the safety of deep learning systems that are inspired by the complex systems literature:. 1. Build sharp cliffs in the reward landscape around bad behaviors, so that models never explore them in the first place.
1
2
25
@JacobSteinhardt
Jacob Steinhardt
2 years
@EpochAIResearch is one of the coolest (and in my opinion underrated) research orgs for understanding trends in ML. Rather than speculating, they meticulously analyze empirical trends and make projections for the future. Lots of interesting findings in their data!.
@pvllss
Pablo Villalobos 🔸
3 years
We at @EpochAIResearch recently published a new short report!. In "Trends in Training Dataset Sizes", we explore the growth of ML training datasets over the past few decades. Doubling time has historically been 16 months for language datasets and 41 months for vision. 🧵1/3
Tweet media one
0
4
24
@JacobSteinhardt
Jacob Steinhardt
2 years
For much more detail on all of this (and more), please read the post!
2
0
24
@JacobSteinhardt
Jacob Steinhardt
2 years
2. Train models to self-regulate and have limited aims. 3. Pretraining shapes most of the structure of a model. Consider what heuristics you are baking in at pretraining time, rather than relying on fine-tuning to fix problems.
1
1
24
@JacobSteinhardt
Jacob Steinhardt
2 years
I've previously made forecasts for mid-2023 (which I'll discuss in July once they resolve). Thinking 7 years out is obviously much harder, but I think important for preparing for the future impacts of ML.
2
0
24
@JacobSteinhardt
Jacob Steinhardt
4 years
Many have heard of deliberate practice, but I identify another importance mental stance called *deliberate play*. Deliberate play is intentional, but with a softer focus. Deliberate practice develops skills; deliberate play develops frameworks.
0
3
23
@JacobSteinhardt
Jacob Steinhardt
3 years
What will SOTA for ML benchmarks be in 2023? I forecast results for the MATH and MMLU benchmarks, two benchmarks that have had surprising progress in the past year:
1
5
21
@JacobSteinhardt
Jacob Steinhardt
3 years
In the next post of this series, I argue that when predicting the future of ML, we should not simply expect existing empirical trends to continue. Instead, we will often observe qualitatively new, "emergent" behavior:
@JacobSteinhardt
Jacob Steinhardt
3 years
A blog post series on a key way I've changed my mind about ML: the (relative) value of empirical data vs. thought experiments for predicting future ML developments.
0
2
21
@JacobSteinhardt
Jacob Steinhardt
2 years
3. Parallel learning. Because copies have identical weights, can propagate millions of gradient updates in parallel. This means models could rapidly learn new tasks (including "bad" tasks like manipulation/misinformation).
1
1
21
@JacobSteinhardt
Jacob Steinhardt
2 years
4. New modalities. Beyond tool use and images, may be trained on proteins, astronomical images, networks, etc. Therefore could have strong intuitive grasp of these more "exotic" domains.
2
2
20
@JacobSteinhardt
Jacob Steinhardt
4 months
More importantly, while MAA data is public domain, the data from Alcumus is not--it belongs to Art of Problem Solving, MATHCOUNTS, and others.
1
0
19
@JacobSteinhardt
Jacob Steinhardt
4 months
. perhaps by having key orgs in the ML community chip in to compensate AoPS, MATHCOUNTS, and others that contributed to MATH for the amazing data they have created and we have all benefited from.
1
0
19
@JacobSteinhardt
Jacob Steinhardt
2 years
I elaborate on these and consider several additional ideas in the blog post itself. Thanks to @DanHendrycks for first articulating the complex systems perspective on deep learning to me. He's continuing to do great work in that and other directions at
0
0
18
@JacobSteinhardt
Jacob Steinhardt
4 months
Since MATH leaks problems and solutions from Alcumus and elsewhere on the AoPS website, for now I've taken it off Berkeley's web server. I'm currently engaged in friendly conversations with AoPS, and my hope is we will find a long-term solution that allows MATH to be re-hosted. .
1
0
18
@JacobSteinhardt
Jacob Steinhardt
3 years
For predicting what future ML systems will look like, it's helpful to have "anchors"---reference classes that are broadly analogous to future ML. Common anchors include "current ML" and "humans", but I think there's many other good choices:
2
3
17
@JacobSteinhardt
Jacob Steinhardt
2 years
I then consider a few ways GPT-2030 could affect society. Importantly, there are serious misuse risks (such as hacking and persuasion) that we should address. These are just two examples, and generally I favor more work on forward-looking analyses of societal impacts.
4
1
16
@JacobSteinhardt
Jacob Steinhardt
1 year
In this work, we build a LM pipeline for automated forecasting. Given any question about a future event, it retrieves and summarizes relevant articles, reasons about them, and predicts the probability that the event occurs.
Tweet media one
2
0
16
@JacobSteinhardt
Jacob Steinhardt
4 months
AoPS is an incredible org that I benefited from as a high school student. They create best-in-world materials for learning mathematical problem solving, including those in Alcumus. They and the other contributing orgs deserve appropriate credit and compensation for their work.
1
1
16
@JacobSteinhardt
Jacob Steinhardt
3 years
If you want to join me on this, you can register predictions on Metaculus for the MATH and Massive Multitask benchmarks:. * * It's pretty easy--just need a Google account. The MATH one is open now and Multitask should be open soon.
@JacobSteinhardt
Jacob Steinhardt
3 years
I suspect most of us in the ML field still haven't internalized how quickly ML capabilities are advancing. We should be preregistering forecasts so that we can learn and correct! I intend to do so for June 2023.
3
4
16
@JacobSteinhardt
Jacob Steinhardt
3 years
@aghobarah Definitely agree in terms of research track record. But in terms of professional standing, Anna's a PhD student and Azalia's on the academic job market right now. This is important, because it means their careers are more affected by this sort of press (vs. a tenured prof).
0
0
15
@JacobSteinhardt
Jacob Steinhardt
5 years
If you’re interested in this, @andrew_ilyas. and I have a working paper discussing these issues in more detail:
0
1
14
@JacobSteinhardt
Jacob Steinhardt
5 years
@chhaviyadav_ Consulates are closed due to COVID-19, so incoming international students can't apply for visas. Has been true for a while but now at the point it is affecting students directly. See e.g. this June letter from GOP representatives asking Pomep to fix it:
1
1
14
@JacobSteinhardt
Jacob Steinhardt
3 years
For those interested in the original forecasts, you can read our blog post here:
1
0
15
@JacobSteinhardt
Jacob Steinhardt
5 years
Some exciting new work by my student @DanHendrycks and collaborators. We identify seven hypotheses about OOD generalization in the literature, and collect several new datasets to test these. Trying to add more "strong inference" to ML (cf. Platt 1964).
@DanHendrycks
Dan Hendrycks
5 years
What methods actually improve robustness? In this paper, we test robustness to changes in geography, time, occlusion, rendition, real image blurs, and so on with 4 new datasets. No published method consistently improves robustness.
Tweet media one
Tweet media two
Tweet media three
0
1
13
@JacobSteinhardt
Jacob Steinhardt
5 years
Curated list of documented police abuse during protests: Compilations like this are a compelling reminder that George Floyd is the most salient instance of a broader trend. (And remember: there's also many good police who are supporting protestors.).
0
1
14
@JacobSteinhardt
Jacob Steinhardt
4 months
I'm optimistic we can resolve this in a way that ends well for everyone, and am working towards that. For now, I'd request that anyone who is hosting a mirror of MATH also take it down until further notice; and for anyone who uses MATH in their work to credit the problem writers:.
1
0
13
@JacobSteinhardt
Jacob Steinhardt
4 months
AoPS & the AoPS Community, MATHCOUNTS, the MAA, the Centre for Education in Mathematics and Computing, the Harvard-MIT Math Tournament, the Math Prize for Girls, MOEMS, the Mandelbrot Competition, and the Institute of Mathematics and Applications.
1
0
12
@JacobSteinhardt
Jacob Steinhardt
5 years
Good to see this analysis, but misleading headline. 24 states have *point estimates* over 1, but uncertainty in estimates is large. Let's consider null hypothesis that Rt=0.95 everywhere. Then would expect 19 states with estimates above 1 (eyeballing stdev=0.17 from fig. 4).
@MRC_Outbreak
MRC Centre for Global Infectious Disease Analysis
5 years
UPDATE: #covid19science #COVID19 in USA. ➡️Initial national average reproduction number R was 2.2.➡️24 states have Rt over 1.➡️Increasing mobility cause resurgence (doubling number of deaths in 8 weeks).➡️4.1% of people infected nationally . 🔰Report
Tweet media one
1
1
13
@JacobSteinhardt
Jacob Steinhardt
1 year
Moreover, averaging our prediction with the crowd consistently outperforms the crowd itself (as measured by Brier score, the most commonly-used metric of forecasting performance).
1
1
13
@JacobSteinhardt
Jacob Steinhardt
1 year
We compare our system to ensembles of competitive human forecasters ("the crowd"). We approach the performance of the crowd across all questions, and beat the crowd on questions where they are less confident (probabilities between 0.3 and 0.7).
Tweet media one
2
1
13
@JacobSteinhardt
Jacob Steinhardt
5 years
The actual issue is that R is not a good metric for directly setting policy, because it's difficult to estimate and far-removed from things in the world we care about, like hospital demand.
1
1
9
@JacobSteinhardt
Jacob Steinhardt
4 months
(There may be a few other small sources; these are the ones that AoPS has confirmed so far.) We will hopefully have more updates soon!.
0
0
12
@JacobSteinhardt
Jacob Steinhardt
1 year
Our system has a number of interesting properties. For instance, our forecasted probabilities are well-calibrated, even though we perform no explicit calibration and even though the base models themselves are not (!).
Tweet media one
1
1
12
@JacobSteinhardt
Jacob Steinhardt
5 years
Lots of people hating on hydroxychloroquine because Trump likes it. But just because Trump likes something doesn't mean it kills people. Maybe it does, but let's demand real evidence instead of giving shoddy science a pass.
1
1
12
@JacobSteinhardt
Jacob Steinhardt
5 years
What's going on with Georgia? They've been "open" for a while now and there's been no apparent spike in cases. I don't think this can just be poor testing because other data sources (e.g. FB surveys) show same thing: 1/5.
2
0
12
@JacobSteinhardt
Jacob Steinhardt
2 years
I *also* still think there are unknown unknowns, and we should probably slow down and understand what current large ML systems are doing, before rushing to deploy new ones. But hopefully concrete behaviors will open the door to concrete research towards addressing them.
1
2
11
@JacobSteinhardt
Jacob Steinhardt
3 years
Examples include gecko feet, operating systems, economic specialization, hemoglobin, polymers, eyes, ant colonies, transistors, cities, and skill acquisition. If you're interested in reading about how this applies to ML, check out the full blog series!
1
2
11
@JacobSteinhardt
Jacob Steinhardt
2 years
@xuanalogue Only thing missing is a counter-prediction so we can compare in 7 years :).
1
0
10
@JacobSteinhardt
Jacob Steinhardt
2 years
Overall, each scenario requires a few things to "go right" for the rogue AI system; I think of them as moderate but not extreme tail events, and assign ~5% probability to "something like" one of these scenarios happening by 2050. (w/ additional prob. on other/unknown scenarios).
2
1
10
@JacobSteinhardt
Jacob Steinhardt
3 years
And the DeepMind paper itself:
1
0
10
@JacobSteinhardt
Jacob Steinhardt
2 years
In research, it's important to create an environment that allows for risk-taking and mistakes, while also pushing eventually towards excellence and innovation. I aim to set discussion norms that promote both of these.
1
0
10
@JacobSteinhardt
Jacob Steinhardt
5 years
Some great recommendations from Chloe Cockburn (a program officer at Open Philanthropy, where I worked last summer). My understanding is that DA elections (starts at #9 on the list) are a high-impact route to police and criminal justice reform.
0
0
10
@JacobSteinhardt
Jacob Steinhardt
7 months
@michael_c_grant @mengk20 @TransluceAI That as our first hypothesis, but getting rid of software versions only helps a little bit! 1% vs. 22% increase across a dataset of examples. (There's many more bible verses than software versions in the training data.). See our write-up for details!
Tweet media one
1
1
10
@JacobSteinhardt
Jacob Steinhardt
1 year
Second, our model underperforms on "easy" questions (where the answer is nearly certain), because it is unwilling to give probabilities very close to 0 or 1. This is possibly an artifact of its safety training.
1
1
10
@JacobSteinhardt
Jacob Steinhardt
8 months
I think this framework is very powerful for finding explainable patterns in large datasets, and I've already begun to use it for explainability challenges I'm facing in other projects. I'd encourage you to check out the blog post, as well as the paper:
@ZhongRuiqi
Ruiqi Zhong
8 months
Paper: Explaining Datasets in Words: Statistical Models with Natural Language Parameters . Link: . Code (try it out!): Blog post: joint work with @HengWang_xjtu, Dan Klein, and @JacobSteinhardt.
0
0
8
@JacobSteinhardt
Jacob Steinhardt
1 year
For some cool related work, see which examines human-LLM forecasting teams, and and which introduce AI forecasting competitions.
1
0
9
@JacobSteinhardt
Jacob Steinhardt
1 year
Signal-boosting this pushback since Nuño has a strong forecasting track record. I agree AI part is not traditional ref. class analysis, but think "AI is an adaptive self-replicator, this often causes problems" is importantly less inside-view than [long arg. about paperclips].
@NunoSempere
Nuño Sempere in London 9-16/May
1 year
@JacobSteinhardt @DhruvMadeka I like the overall analysis. I think that the move of noticing that AIs might share some characteristics with pandemics, in that AIs might be self-replicating, is an inside-view move, and I don't feel great about characterizing that as a reference class analysis.
1
0
9
@JacobSteinhardt
Jacob Steinhardt
3 years
Interesting opportunity to do mechanistic interpretability research! (I have worked/collaborated with Redwood and enjoyed it.).
@NeelNanda5
Neel Nanda
3 years
I'm helping Redwood Research run REMIX, a 1 month mechanistic interpretability sprint where 25+ people to reverse engineer circuits in GPT-2 Small. This seems a great way to get experience exploring @ch402's transformer circuits work. Apply by 13th Nov!.
0
0
9
@JacobSteinhardt
Jacob Steinhardt
5 years
Open letter on police reform at UC Berkeley. I helped draft this, together with several amazing students. If you're at UCB and want to sign, please get in touch via e-mail. UCB has already pursued some good reforms, but there's much more to be done.
0
0
6
@JacobSteinhardt
Jacob Steinhardt
1 year
We are excited to continue this work! Please email @dannyhalawi15 at dannyhalawi15@gmail.com to get in touch.
3
0
9