Arvind Narayanan @random_walker X Profile

Arvind Narayanan

@random_walker

Followers

126K

Following

20K

Media

896

Statuses

13K

Princeton CS prof and Director @PrincetonCITP. Coauthor of "AI Snake Oil" and "AI as Normal Technology". https://t.co/ZwebetjZ4n Views mine.

https://t.co/px6fpS9QFq

Princeton, NJ

Joined December 2007

Don't wanna be here? Send us removal request.

Arvind Narayanan

@random_walker

3 months

I’m excited to announce I’ve started a YouTube channel. I plan to publish videos regularly explaining my views on AI and its present and future impacts. My first video asks: What happens if there’s an AI crash? https://t.co/sEGeoCyHmk This is my first foray into video (beyond

17

50

334

Arvind Narayanan

@random_walker

2 days

📢📢 I'm looking for a postdoctoral fellow and so are many of my amazing faculty colleagues @PrincetonCITP. The center's mission is to understand and improve the relationship between tech and society. Apply soon for full consideration. Details: https://t.co/AGphhLkU60 The center

1

21

96

Ahmad Beirami ✈️ NeurIPS

@abeirami

2 days

💯 absolutely right! "We think studying the coupling between models and scaffolds is an important research direction going forward, especially as more developers release scaffolds that their models might be finetuned to work well with"

Sayash Kapoor

@sayashk

3 days

CORE-Bench is solved (using Opus 4.5 with Claude Code) TL;DR: Last week, we released results for Opus 4.5 on CORE-Bench, a benchmark that tests agents on scientific reproducibility tasks. Earlier this week, Nicholas Carlini reached out to share that an updated scaffold that uses

0

3

50

Alex Dimakis

@AlexGDimakis

2 days

Agent scaffolds are as important as models.

Sayash Kapoor

@sayashk

3 days

CORE-Bench is solved (using Opus 4.5 with Claude Code) TL;DR: Last week, we released results for Opus 4.5 on CORE-Bench, a benchmark that tests agents on scientific reproducibility tasks. Earlier this week, Nicholas Carlini reached out to share that an updated scaffold that uses

10

20

200

Sayash Kapoor

@sayashk

3 days

CORE-Bench is solved (using Opus 4.5 with Claude Code) TL;DR: Last week, we released results for Opus 4.5 on CORE-Bench, a benchmark that tests agents on scientific reproducibility tasks. Earlier this week, Nicholas Carlini reached out to share that an updated scaffold that uses

26

104

740

BURKOV

@burkov

3 days

This paper really is groundbreaking. It solves a long-standing embarrassment in machine learning: despite all the hype around deep learning, traditional tree-based methods (XGBoost, CatBoost, random forests, etc) have dominated tabular data—the most common data format in

74

398

3K

Derek Thompson

@DKThomp

4 days

This is a great piece with some mind-boggling statistics. - At Brown and Harvard, more than 20% of undergraduates are registered as disabled - At Amherst: more than 30 percent - At Stanford: nearly 40 percent Soon, many of these schools "may have more students receiving

987

3K

17K

Dr. Jon Slotkin

@slotkinjr

4 days

I have a guest essay in @nytimes today about autonomous vehicle safety. I wrote it because I’m tired of seeing children die. Done right, we can eliminate car crashes as a leading cause of death in the United States @Waymo recently released data covering nearly 100 million

326

1K

6K

Harmonic

@HarmonicMath

6 days

Many of us intuitively feel that the field of mathematics is going to change, so let's unpack the likely outcomes, without resorting to hyperbole or doomerism.

10

30

267

Itamar Caspi

@itamarcaspi

12 days

Right now this "agentic reviewer" is like putting a newspaper online as HTML. It imitates existing referee reports and optimizes agreement with scores, so it learns the current tastes, fads, and biases of human reviewers rather than questioning them. From an economic perspective

Andrew Ng

@AndrewYNg

12 days

Releasing a new "Agentic Reviewer" for research papers. I started coding this as a weekend project, and @jyx_su made it much better. I was inspired by a student who had a paper rejected 6 times over 3 years. Their feedback loop -- waiting ~6 months for feedback each time -- was

8

13

154

Andy Masley

@AndyMasley

17 days

I think that Hao made a bad but honest mistake and I don't mean to attack her overall character as a journalist. In contrast, I would like to take this opportunity to directly attack the journalistic integrity of More Perfect Union, who are much more influential in the AI water

18

61

700

Timothy B. Lee

@binarybits

21 days

NYTimes article quotes someone saying they are "terrified" of Waymo in paragraph 6. Waits until paragraph 33 (out of 44 paragraphs) to mention that they are 91 percent safer than human drivers. How outraged would liberals be if a news outlet covered vaccines like this?

71

281

3K

John Loeber 🎢

@johnloeber

22 days

When I was in college at UChicago, I dated another student from Appalachia. She once told me she had gotten straight As in high school calculus -- and when she took the AP exam, she got a 1, the worst possible grade. (And she was pretty talented, she later on pretty well at

Kelley K

@KelleyKga

23 days

The fact is that high schools are graduating kids with As and Bs in advanced math courses who haven't mastered foundational skills. The data from the UCSD report makes that clear. 20% took calculus in high school! Their GPA in math classes is ~3.6! This is happening all over the

123

442

6K

Ruxandra Teslo 🧬

@RuxandraTeslo

22 days

Don't make me tap the sign

Dr Singularity

@Dr_Singularity

23 days

We will now be able to discover new drugs 1000s of times faster. Thanks to AI, all diseases will be curable during the 2030s. MADD - Multi Agent Drug Discovery Orchestra, a multi agent AI system designed to massively accelerate the early stages of drug discovery, especially

82

500

6K

Arvind Narayanan

@random_walker

22 days

@snewmanpv @sayashk @DKokotajlo @eli_lifland @thlarsen IMO reliability is only one of the limitations; the others are preference elicitation and the adversarial nature of the environment. I've written a bit about it here: https://t.co/r2lqcoeEQ4 (That said I do think the reliability issue alone is pretty challenging — you'd need ~3

Arvind Narayanan

@random_walker

1 year

Google's Deep Research is an excellent application of agentic capabilities. One example something it can do pretty well is search for all my podcasts and interviews and create a webpage listing them. Cuts down effort at least 10x compared to doing it manually. The reason it works

2

1

17

Micah Goldblum

@micahgoldblum

23 days

An LLM-generated paper is in the top 17% of ICLR submissions in terms of average reviewer score, having received two 8's. The paper has tons of BS jargon and hallucinated references. Fortunately, one reviewer actually looked at the paper and gave it a zero. 1/3

40

146

1K

gavin leech (Non-Reasoning)

@g_leech_

23 days

Glad somebody did this (expert interviews on why LLMs are not currently AGI, and why they could be) feat: @random_walker, @DKokotajlo, @ben_j_todd, @daniel_d_kang, @rohinmshah

Geoffrey Irving

@geoffreyirving

1 month

New AISI report mapping cruxes behind whether AI progress might be fast or slow on the path to systems near or beyond human-level at most cognitive tasks. The goal is not to resolve uncertainties but reflect them: we don't know how AI will go, and should plan accordingly!

2

22

141

Daniel Kokotajlo

@DKokotajlo

24 days

Common ground between the authors of AI 2027 and AI as Normal Technology! Coauthored article below.

24

67

404

Arvind Narayanan

@random_walker

24 days

We enjoyed the opportunity for productive discussion with the authors of AI 2027 to find areas of common ground. We are also planning an “adversarial collaboration”.

11

31

256

Andy Masley

@AndyMasley

25 days

https://t.co/cfvtVNanUq