Stephen Bach Profile
Stephen Bach

@stevebach

Followers
2K
Following
4K
Media
33
Statuses
2K

Asst. prof. @BrownCSDept. Working on improving how humans teach computers. Weak supervision, zero-shot learning, few-shot learning, and high-level knowledge.

Joined August 2007
Don't wanna be here? Send us removal request.
@yong_zhengxin
Yong Zheng-Xin (Yong)
3 days
🚨 Reasoning models can “self-jailbreak”: they recognize a request is harmful, invent a reason why it’s fine, then help with it. We found that after training on benign math/code reasoning, models emergently start to reason themselves out of safety alignment. 🧵👇
1
6
16
@2plus2make5
Emma Pierson
15 days
Do you have many models to choose from and little labeled data with which to evaluate them? Check out our #neurips2025 paper, which presents a method to estimate model performance more accurately than previous methods using both labeled + unlabeled data.
@dmshanmugam
Divya Shanmugam
15 days
New #NeurIPS2025 paper: how should we evaluate machine learning models without a large, labeled dataset? We introduce Semi-Supervised Model Evaluation (SSME), which uses labeled and unlabeled data to estimate performance! We find SSME is far more accurate than standard methods.
2
12
107
@elmelis
David Alvarez Melis
17 days
📄 New preprint alert: We study 🪃Boomerang Distillation🪃, a surprising phenomenon that allows generating a family of pre-trained LLMs of intermediate sizes from a single teacher–student pair — 𝐧𝐨 𝐞𝐱𝐭𝐫𝐚 𝐭𝐫𝐚𝐢𝐧𝐢𝐧𝐠 𝐫𝐞𝐪𝐮𝐢𝐫𝐞𝐝! 🧵👇
2
25
120
@_avichawla
Avi Chawla
18 days
Finally, Python 3.14 lets you disable GIL! It's a big deal because earlier, even if you wrote multi-threaded code, Python could only run one thread at a time, giving no performance benefit. But now, Python can run your multi-threaded code in parallel. And uv fully supports it!
120
510
5K
@apoorvkh
Apoorv Khandelwal
25 days
In our new paper, we ask whether language models solve compositional tasks using compositional mechanisms. 🧵
4
27
183
@apoorvkh
Apoorv Khandelwal
30 days
Our “academic pre-training” paper was accepted to COLM! I’ll be presenting at the Tuesday (11 AM) poster session!
@apoorvkh
Apoorv Khandelwal
1 year
Wondering how long it takes to train a 1B-param LM from scratch on your GPUs? 🧵 See our paper to learn about the current state of academic compute and how to efficiently train models! Use our code to test your own models/GPUs! https://t.co/hvrjwlApN8 https://t.co/1JnEe2CCLr
0
3
19
@BrownCSDept
Brown CS
2 months
This summer, sixteen of the nation’s future tech and policy leaders came to @BrownUniversity for a program that's the first of its kind worldwide, the CNTR's AI Policy Summer School: https://t.co/70AqAsavWI
1
2
4
@BrownCSDept
Brown CS
2 months
Formerly a @BrownCSDept postdoctoral researcher advised by @ShriramKMurthi, Will Crichton (@tonofcrates) returns this fall as assistant professor. He’s one of two recent hires in the multi-year CS With Impact campaign, our largest expansion to date: https://t.co/vxAu8WTx13
2
5
63
@StefanieTellex
Stefanie Tellex
2 months
I took a quadruped robot from Brown to 7 schools and a library this year, age range 3 years old to 13.
Tweet card summary image
whattotelltherobot.com
There are seven senses, not five.
0
4
12
@Brown_DSI
Brown Data Science Institute
2 months
The Center for Technological Responsibility, Re-imagination and Redesign (CNTR) at DSI is leading the charge in tech & AI policy education with its new program: the CNTR Summer School. Read more about this innovative new program at Brown: https://t.co/Bke1JbcbAm
Tweet card summary image
cntr.brown.edu
The Center for Technological Responsibility, Re-imagination, and Redesign (CNTR)’s Tech & Policy Summer School brings together a new generation of technology policymakers to bridge the gap between...
0
5
4
@willccbb
will brown
3 months
i'm increasingly convinced that "transformative ai" is going to look like an abundance of specialized models for everything from drug design to weather sims to robotics to supply chains, not one agent to rule them all. we're going to need a lot more ai researchers
113
114
2K
@jessicadai_
jessica dai
3 months
hey wasn't this the same company that made a beautiful shiny "research" post about how AI evals should include error bars or something like that. or did they decide the CLT doesn't apply when it would imply no effect https://t.co/HXddeYeIyO
@AnthropicAI
Anthropic
3 months
Today we're releasing Claude Opus 4.1, an upgrade to Claude Opus 4 on agentic tasks, real-world coding, and reasoning.
12
29
826
@BrownCSDept
Brown CS
3 months
@BrownCSDept faculty members Ellie Pavlick and Suresh Venkatasubramanian (@geomblog) have just received a $20M @NSF grant to found ARIA, a national institute to develop intuitive, trustworthy AI assistants. Learn more at Brown CS News: https://t.co/fGqXCiyKrt
1
5
49
@BrownUniversity
Brown University
3 months
With a $20 million grant from the @NSF, Brown University researchers will lead an artificial intelligence research institute aimed at developing a new generation of AI assistants for use in mental and behavioral health. https://t.co/TneZjtix4O
Tweet card summary image
brown.edu
A new institute, based at Brown and supported by a $20 million National Science Foundation grant, will convene researchers to guide development of a new generation of AI assistants for use in mental...
1
11
22
@TacoCohen
Taco Cohen
3 months
What I look for when hiring? EXTREME PARANOIA about code and data
15
13
316
@SnorkelAI
Snorkel AI
3 months
Not all benchmarks are created equal. We built a PhD-level multiple-choice test across 1,000+ subdomains, STEM, humanities, pro fields. Top LLMs? Scored <20%. This is what it takes to test advanced reasoning. Built with Snorkel’s Expert Data-as-a-Service. #LLM #GenAI
0
2
8
@_lewtun
Lewis Tunstall
3 months
An under appreciated fact about using formal methods like Lean is that it enables large-scale *collaboration* among mathematicians & potentially future AI agents. Why? Well, you can decompose a large proof into separate components that can be proven independently with robust
1
7
49
@ajratner
Alex Ratner
3 months
Thanks @lateinteraction ! Every time I think about the gazillion prompt / systems engineering tweaks that also go into making an AI system work I think about how early you were with @DSPyOSS :) Shared theme: find the key human input and make it programmatic.
@lateinteraction
Omar Khattab
3 months
Every time I think about what it takes to systematically organize the gazillion training tasks that together make a great foundation model, my appreciation for how early @SnorkelAI was increases.
2
4
37
@ajratner
Alex Ratner
3 months
America’s innovative edge makes us great—tell Congress: https://t.co/tt7Pxl1tQD Check out (and help!) push this nonpartisan campaign for investing in our most critical national edge! #ProtectScience #InnovationMakesAmericaGreat
0
2
6
@BrownCSDept
Brown CS
4 months
@diana_freed has received a CRA Trustworthy AI Research Fellowship, supporting early-career computing researchers who bring interdisciplinary expertise from the social sciences to infuse ethical and societal perspectives into Trustworthy AI development: https://t.co/HRcgZ3Tbxf
0
1
7