
Jacob Steinhardt
@JacobSteinhardt
Followers
9K
Following
185
Media
22
Statuses
412
Assistant Professor of Statistics and EECS, UC Berkeley // Co-founder and CEO, @TransluceAI
Joined December 2011
In July, I went on leave from UC Berkeley to found @TransluceAI, together with Sarah Schwettmann (@cogconfluence). Now, our work is finally public.
Announcing Transluce, a nonprofit research lab building open source, scalable technology for understanding AI systems and steering them in the public interest. Read a letter from the co-founders Jacob Steinhardt and Sarah Schwettmann: .
2
19
348
My student Kayo Yin needs your help. Her visa has been unnecessarily delayed, which would prevent her from coming to UC Berkeley to start her studies. Despite bringing all required documents, the @StateDept refused to process the visa and it could take months to re-process.
25
306
1K
This NYT article on Azalia and Anna's excellent chip design work is gross, to the point of journalistic malpractice. It platforms a bully while drawing an absurd parallel to @timnitGebru's firing. @CadeMetz should be ashamed. (not linking so it doesn't get more clicks)
16
43
397
Can we build an LLM system to forecast geo-political events at the level of human forecasters?. Introducing our work Approaching Human-Level Forecasting with Language Models!. Arxiv: Joint work with @dannyhalawi15, @FredZhang0, and @jcyhc_ai
11
68
380
Awesome to see @DeepMind 's recent language modeling paper include our forecasts as a comparison point! Hopefully more papers track progress relative to forecasts so that we can better understand the pace of progress in deep learning.
1
23
194
This is an important paper that everyone should read (perhaps most interesting one this year). It provides a trend line for how AI autonomy is increasing over time. My take: results are evidence against general autonomy in next few years, but make it seem more likely in 2029-33.
When will AI systems be able to carry out long projects independently?. In new research, we find a kind of “Moore’s Law for AI agents”: the length of tasks that AIs can do is doubling about every 7 months.
3
16
150
A new blog post, this time a guest post by my student @ZhongRuiqi . Ruiqi has some very cool work defining a family of statistical models that can include natural language descriptions as part of their parameter space:.
2
22
130
I quite enjoyed this workshop, and was pretty happy with the talk I gave (new and made ~from scratch!). My topic was using LLMs to help us understand LLMs, and covers great work by @TongPetersb, @ErikJones313, @ZhongRuiqi +others. You can watch it here:
1
16
89
New company founded by people who I like. Good to see the focus on openness and transparency --- this will help the scientific community and public better understand the behavior and implications of AI.
Today, we are excited to announce Thinking Machines Lab (, an artificial intelligence research and product company. We are scientists, engineers, and builders behind some of the most widely used AI products and libraries, including ChatGPT,.
1
0
82
Transluce is building open and scalable tech addressing some of the biggest questions in AI: how can we understand and predict the behavior of AI systems, and know when they’re safe to deploy? . Want to chat at NeurIPS? RSVP here:
Transluce will be at #NeurIPS2024!. Who’s coming to lunch on Thursday to meet the team and learn about open problems we're working on? .Space is limited, RSVP soon.
0
5
72
Interestingly, forecasters' biggest miss was on the MATH dataset, where @alewkowycz @ethansdyer and others set a record of 50.3% on the very last day of June! One day made a huge difference.
2
6
48
New paper on household transmission of SARS-CoV-2: with @mihaela_curmei, @andrew_ilyas, and @OwainEvans_UK. Very interested in feedback! We show that under lockdowns, 30-55% of transmissions occur in houses. 1/4.
2
13
49
Nora is a super creative thinker and very capable engineer. I'd highly recommend working for her if you want to do cool work on understanding ML models at an open-source org!.
My Interpretability research team at @AiEleuther is hiring! If you're interested, please read our job posting and submit:.1. Your CV.2. Three interp papers you'd like to build on.3. Links to cool open source repos you've built.to contact@eleuther.ai
5
0
38
Some nice pushback on my GPT-2030 post by @xuanalogue, with lots of links!.
I respect Jacob a lot but I find it really difficult to engage with predictions of LLM capabilities that presume some version of the scaling hypothesis will continue to hold - it just seems highly implausible given everything we already know about the limits of transformers!.
2
2
36
@EpochAIResearch is one of the coolest (and in my opinion underrated) research orgs for understanding trends in ML. Rather than speculating, they meticulously analyze empirical trends and make projections for the future. Lots of interesting findings in their data!.
We at @EpochAIResearch recently published a new short report!. In "Trends in Training Dataset Sizes", we explore the growth of ML training datasets over the past few decades. Doubling time has historically been 16 months for language datasets and 41 months for vision. 🧵1/3
0
4
24
In the next post of this series, I argue that when predicting the future of ML, we should not simply expect existing empirical trends to continue. Instead, we will often observe qualitatively new, "emergent" behavior:
A blog post series on a key way I've changed my mind about ML: the (relative) value of empirical data vs. thought experiments for predicting future ML developments.
0
2
21
I elaborate on these and consider several additional ideas in the blog post itself. Thanks to @DanHendrycks for first articulating the complex systems perspective on deep learning to me. He's continuing to do great work in that and other directions at
0
0
18
If you want to join me on this, you can register predictions on Metaculus for the MATH and Massive Multitask benchmarks:. * * It's pretty easy--just need a Google account. The MATH one is open now and Multitask should be open soon.
I suspect most of us in the ML field still haven't internalized how quickly ML capabilities are advancing. We should be preregistering forecasts so that we can learn and correct! I intend to do so for June 2023.
3
4
16
@aghobarah Definitely agree in terms of research track record. But in terms of professional standing, Anna's a PhD student and Azalia's on the academic job market right now. This is important, because it means their careers are more affected by this sort of press (vs. a tenured prof).
0
0
15
If you’re interested in this, @andrew_ilyas. and I have a working paper discussing these issues in more detail:
0
1
14
@chhaviyadav_ Consulates are closed due to COVID-19, so incoming international students can't apply for visas. Has been true for a while but now at the point it is affecting students directly. See e.g. this June letter from GOP representatives asking Pomep to fix it:
1
1
14
Some exciting new work by my student @DanHendrycks and collaborators. We identify seven hypotheses about OOD generalization in the literature, and collect several new datasets to test these. Trying to add more "strong inference" to ML (cf. Platt 1964).
What methods actually improve robustness? In this paper, we test robustness to changes in geography, time, occlusion, rendition, real image blurs, and so on with 4 new datasets. No published method consistently improves robustness.
0
1
13
Good to see this analysis, but misleading headline. 24 states have *point estimates* over 1, but uncertainty in estimates is large. Let's consider null hypothesis that Rt=0.95 everywhere. Then would expect 19 states with estimates above 1 (eyeballing stdev=0.17 from fig. 4).
UPDATE: #covid19science #COVID19 in USA. ➡️Initial national average reproduction number R was 2.2.➡️24 states have Rt over 1.➡️Increasing mobility cause resurgence (doubling number of deaths in 8 weeks).➡️4.1% of people infected nationally . 🔰Report
1
1
13
Some great recommendations from Chloe Cockburn (a program officer at Open Philanthropy, where I worked last summer). My understanding is that DA elections (starts at #9 on the list) are a high-impact route to police and criminal justice reform.
0
0
10
@michael_c_grant @mengk20 @TransluceAI That as our first hypothesis, but getting rid of software versions only helps a little bit! 1% vs. 22% increase across a dataset of examples. (There's many more bible verses than software versions in the training data.). See our write-up for details!
1
1
10
I think this framework is very powerful for finding explainable patterns in large datasets, and I've already begun to use it for explainability challenges I'm facing in other projects. I'd encourage you to check out the blog post, as well as the paper:
Paper: Explaining Datasets in Words: Statistical Models with Natural Language Parameters . Link: . Code (try it out!): Blog post: joint work with @HengWang_xjtu, Dan Klein, and @JacobSteinhardt.
0
0
8
Signal-boosting this pushback since Nuño has a strong forecasting track record. I agree AI part is not traditional ref. class analysis, but think "AI is an adaptive self-replicator, this often causes problems" is importantly less inside-view than [long arg. about paperclips].
@JacobSteinhardt @DhruvMadeka I like the overall analysis. I think that the move of noticing that AIs might share some characteristics with pandemics, in that AIs might be self-replicating, is an inside-view move, and I don't feel great about characterizing that as a reference class analysis.
1
0
9
Interesting opportunity to do mechanistic interpretability research! (I have worked/collaborated with Redwood and enjoyed it.).
I'm helping Redwood Research run REMIX, a 1 month mechanistic interpretability sprint where 25+ people to reverse engineer circuits in GPT-2 Small. This seems a great way to get experience exploring @ch402's transformer circuits work. Apply by 13th Nov!.
0
0
9
We are excited to continue this work! Please email @dannyhalawi15 at dannyhalawi15@gmail.com to get in touch.
3
0
9