Stephen Bach @stevebach X Profile

Stephen Bach

@stevebach

Followers

2K

Following

4K

Media

33

Statuses

2K

Asst. prof. @BrownCSDept. Working on improving how humans teach computers. Weak supervision, zero-shot learning, few-shot learning, and high-level knowledge.

https://t.co/2qTXEhGgkq

Joined August 2007

Don't wanna be here? Send us removal request.

Yong Zheng-Xin (Yong)

@yong_zhengxin

3 days

🚨 Reasoning models can “self-jailbreak”: they recognize a request is harmful, invent a reason why it’s fine, then help with it. We found that after training on benign math/code reasoning, models emergently start to reason themselves out of safety alignment. 🧵👇

1

6

16

Emma Pierson

@2plus2make5

15 days

Do you have many models to choose from and little labeled data with which to evaluate them? Check out our #neurips2025 paper, which presents a method to estimate model performance more accurately than previous methods using both labeled + unlabeled data.

Divya Shanmugam

@dmshanmugam

15 days

New #NeurIPS2025 paper: how should we evaluate machine learning models without a large, labeled dataset? We introduce Semi-Supervised Model Evaluation (SSME), which uses labeled and unlabeled data to estimate performance! We find SSME is far more accurate than standard methods.

2

12

107

David Alvarez Melis

@elmelis

17 days

📄 New preprint alert: We study 🪃Boomerang Distillation🪃, a surprising phenomenon that allows generating a family of pre-trained LLMs of intermediate sizes from a single teacher–student pair — 𝐧𝐨 𝐞𝐱𝐭𝐫𝐚 𝐭𝐫𝐚𝐢𝐧𝐢𝐧𝐠 𝐫𝐞𝐪𝐮𝐢𝐫𝐞𝐝! 🧵👇

2

25

120

Avi Chawla

@_avichawla

18 days

Finally, Python 3.14 lets you disable GIL! It's a big deal because earlier, even if you wrote multi-threaded code, Python could only run one thread at a time, giving no performance benefit. But now, Python can run your multi-threaded code in parallel. And uv fully supports it!

120

510

5K

Apoorv Khandelwal

@apoorvkh

25 days

In our new paper, we ask whether language models solve compositional tasks using compositional mechanisms. 🧵

4

27

183

Apoorv Khandelwal

@apoorvkh

30 days

Our “academic pre-training” paper was accepted to COLM! I’ll be presenting at the Tuesday (11 AM) poster session!

Apoorv Khandelwal

@apoorvkh

1 year

Wondering how long it takes to train a 1B-param LM from scratch on your GPUs? 🧵 See our paper to learn about the current state of academic compute and how to efficiently train models! Use our code to test your own models/GPUs! https://t.co/hvrjwlApN8 https://t.co/1JnEe2CCLr

0

3

19

Brown CS

@BrownCSDept

2 months

This summer, sixteen of the nation’s future tech and policy leaders came to @BrownUniversity for a program that's the first of its kind worldwide, the CNTR's AI Policy Summer School: https://t.co/70AqAsavWI

1

2

4

Brown CS

@BrownCSDept

2 months

Formerly a @BrownCSDept postdoctoral researcher advised by @ShriramKMurthi, Will Crichton (@tonofcrates) returns this fall as assistant professor. He’s one of two recent hires in the multi-year CS With Impact campaign, our largest expansion to date: https://t.co/vxAu8WTx13

2

5

63

Stefanie Tellex

@StefanieTellex

2 months

I took a quadruped robot from Brown to 7 schools and a library this year, age range 3 years old to 13.

whattotelltherobot.com

There are seven senses, not five.

0

4

12

Brown Data Science Institute

@Brown_DSI

2 months

The Center for Technological Responsibility, Re-imagination and Redesign (CNTR) at DSI is leading the charge in tech & AI policy education with its new program: the CNTR Summer School. Read more about this innovative new program at Brown: https://t.co/Bke1JbcbAm

cntr.brown.edu

The Center for Technological Responsibility, Re-imagination, and Redesign (CNTR)’s Tech & Policy Summer School brings together a new generation of technology policymakers to bridge the gap between...

0

5

4

will brown

@willccbb

3 months

i'm increasingly convinced that "transformative ai" is going to look like an abundance of specialized models for everything from drug design to weather sims to robotics to supply chains, not one agent to rule them all. we're going to need a lot more ai researchers

113

114

2K

jessica dai

@jessicadai_

3 months

hey wasn't this the same company that made a beautiful shiny "research" post about how AI evals should include error bars or something like that. or did they decide the CLT doesn't apply when it would imply no effect https://t.co/HXddeYeIyO

Anthropic

@AnthropicAI

3 months

Today we're releasing Claude Opus 4.1, an upgrade to Claude Opus 4 on agentic tasks, real-world coding, and reasoning.

12

29

826

Brown CS

@BrownCSDept

3 months

@BrownCSDept faculty members Ellie Pavlick and Suresh Venkatasubramanian (@geomblog) have just received a $20M @NSF grant to found ARIA, a national institute to develop intuitive, trustworthy AI assistants. Learn more at Brown CS News: https://t.co/fGqXCiyKrt

1

5

49

Brown University

@BrownUniversity

3 months

With a $20 million grant from the @NSF, Brown University researchers will lead an artificial intelligence research institute aimed at developing a new generation of AI assistants for use in mental and behavioral health. https://t.co/TneZjtix4O

brown.edu

A new institute, based at Brown and supported by a $20 million National Science Foundation grant, will convene researchers to guide development of a new generation of AI assistants for use in mental...

1

11

22

Taco Cohen

@TacoCohen

3 months

What I look for when hiring? EXTREME PARANOIA about code and data

15

13

316

Snorkel AI

@SnorkelAI

3 months

Not all benchmarks are created equal. We built a PhD-level multiple-choice test across 1,000+ subdomains, STEM, humanities, pro fields. Top LLMs? Scored <20%. This is what it takes to test advanced reasoning. Built with Snorkel’s Expert Data-as-a-Service. #LLM #GenAI

0

2

8

Lewis Tunstall

@_lewtun

3 months

An under appreciated fact about using formal methods like Lean is that it enables large-scale *collaboration* among mathematicians & potentially future AI agents. Why? Well, you can decompose a large proof into separate components that can be proven independently with robust

1

7

49

Alex Ratner

@ajratner

3 months

Thanks @lateinteraction ! Every time I think about the gazillion prompt / systems engineering tweaks that also go into making an AI system work I think about how early you were with @DSPyOSS :) Shared theme: find the key human input and make it programmatic.

Omar Khattab

@lateinteraction

3 months

Every time I think about what it takes to systematically organize the gazillion training tasks that together make a great foundation model, my appreciation for how early @SnorkelAI was increases.

2

4

37

Alex Ratner

@ajratner

3 months

America’s innovative edge makes us great—tell Congress: https://t.co/tt7Pxl1tQD Check out (and help!) push this nonpartisan campaign for investing in our most critical national edge! #ProtectScience #InnovationMakesAmericaGreat

0

2

6

Brown CS

@BrownCSDept

4 months

@diana_freed has received a CRA Trustworthy AI Research Fellowship, supporting early-career computing researchers who bring interdisciplinary expertise from the social sciences to infuse ethical and societal perspectives into Trustworthy AI development: https://t.co/HRcgZ3Tbxf

0

1

7