
Zico Kolter
@zicokolter
Followers
24K
Following
829
Media
38
Statuses
643
Professor and Head of Machine Learning Department at @CarnegieMellon. Board member @OpenAI and @Qualcomm. Chief Technical Advisor @GraySwanAI.
Pittsburgh, PA
Joined March 2017
Gray Swan AI Arena sponsored by @hackthebox_eu present the Machine-in-the-Middle Challenge, a $100K competition exploring how humans & AI perform together in real offensive security scenarios.
7
23
151
Even with full-batch gradients, DL optimizers defy classical optimization theory, as they operate at the *edge of stability.* With @alex_damian_, we introduce "central flows": a theoretical tool to analyze these dynamics that makes accurate quantitative predictions on real NNs.
19
204
1K
🚨Excited to introduce a major development in building safer language models: Safety Pretraining! Instead of post-hoc alignment, we take a step back and embed safety directly into pretraining. 🧵(1/n)
7
88
342
Beyond humbled to be on this year's #TIME100AI AI can be an asset for climate & energy - but only if its development is guided by actual climate needs & planetary limits. Shoutout to those in the community working to shape a responsible, equitable, climate-aligned AI future 🌍💪
24
17
261
This semester, Matt Gormley & I are co-teaching CMU's Generative AI course! Today we discussed the Transformer architecture & Multi-Headed Attention. Follow along 👇 if you want to learn more about the tech that's powering today's AI, from ChatGPT to reasoning models to agents!
5
8
135
Today we release gpt-oss-120b and gpt-oss-20b—two open-weight LLMs that deliver strong performance and agentic tool use. Before release, we ran a first of its kind safety analysis where we fine-tuned the models to intentionally maximize their bio and cyber capabilities 🧵
110
365
3K
Open models can unlock huge benefits, and like any powerful technology, they carry misuse risks. Once the weights are released, there’s no pulling them back. This is why safety testing matters even more here. 1/
Today we release gpt-oss-120b and gpt-oss-20b—two open-weight LLMs that deliver strong performance and agentic tool use. Before release, we ran a first of its kind safety analysis where we fine-tuned the models to intentionally maximize their bio and cyber capabilities 🧵
4
4
80
Our open models are here. Both of them. https://t.co/9tFxefOXcg
openai.com
Advanced open-weight reasoning models to customize for any use case and run anywhere.
1K
3K
20K
1/ Updated now with nearly tight lower bounds—i.e., proofs showing when alignment becomes intractable, even for ideal agents. Key AI safety takeaways: 🧠 Too many values ⇒ makes alignment intractable 👁 Task-space growth ⇒ oversight failure 🤖 Bounded agents need the right
Are there fundamental barriers to AI alignment once we develop generally-capable AI agents? We mathematically prove the answer is *yes*, and outline key properties for a "safe yet capable" agent. 🧵👇 Paper: https://t.co/6ogluaAQCm
1
2
14
We deployed 44 AI agents and offered the internet $170K to attack them. 1.8M attempts, 62K breaches, including data leakage and financial loss. 🚨 Concerningly, the same exploits transfer to live production agents… (example: exfiltrating emails through calendar event) 🧵
71
393
2K
A mental model I find useful: all data acquisition (web scrapes, synthetic data, RL rollouts, etc.) is really an exploration problem 🔍. This perspective has some interesting implications for where AI is heading. Wrote down some thoughts: https://t.co/VQLrYuJVAR
yidingjiang.github.io
This post explores the idea that the next breakthroughs in AI may hinge more on how we collect experience through exploration, and less on how many parameters and data points we have.
5
59
429
now the code is up here:
github.com
JAX implementation of MeanFlow. Contribute to Gsunshine/meanflow development by creating an account on GitHub.
Excited to share our work with my amazing collaborators, @Goodeat258, @SimulatedAnneal, @zicokolter, and Kaiming. In a word, we show an “identity learning” approach for generative modeling, by relating the instantaneous/average velocity in an identity. The resulting model,
2
17
71
🚨Excited to release OS-Harm! 🚨 The safety of computer use agents has been largely overlooked. We created a new safety benchmark based on OSWorld for measuring 3 broad categories of harm: 1. deliberate user misuse, 2. prompt injections, 3. model misbehavior.
3
31
110
Wrote my first blog post! I wanted to share a powerful yet under-recognized way to develop emotional maturity as a researcher: making it a habit to read about the ✨past ✨ and learn from it to make sense of the present
2
14
122
✨ Did you know that NOT using all generated rollouts in GRPO can boost your reasoning LLM? Meet PODS! We down-sample rollouts and train on just a fraction, delivering notable gains over vanilla GRPO. (1/7)
6
16
137
Introducing FLAME-MoE: a fully open platform for Mixture-of-Experts (MoE) research. All code, data, checkpoints, training logs, and evaluation results are public—across 7 different scales. Paper: https://t.co/NsSk603rPi Code: https://t.co/pLgXfWkJnB
2
22
61
Excited to share our work with my amazing collaborators, @Goodeat258, @SimulatedAnneal, @zicokolter, and Kaiming. In a word, we show an “identity learning” approach for generative modeling, by relating the instantaneous/average velocity in an identity. The resulting model,
5
39
154
Excited to be talking today about how research into memorization provides a fundamentally different lens on safety!
For this week’s NLP Seminar, we are thrilled to host @pratyushmaini to talk about “What Memorization Research Taught Me About Safety” When: 5/8 Thurs 11am PT Non-Stanford affiliates registration form: https://t.co/G3IoKOFey7
3
9
100
A shorter version of the first three chapters of my thesis is accepted by ICML 2025. It provides a quick start for those interested in learning about the contexture theory. Check it out:
arxiv.org
Despite the empirical success of foundation models, we do not have a systematic characterization of the representations that these models learn. In this paper, we establish the contexture theory....
Why can foundation models transfer to so many downstream tasks? Will the scaling law end? Will pretraining end like Ilya Sutskever predicted? My PhD thesis builds the contexture theory to answer the above. Blog: https://t.co/MCIJifkU1Z Paper: https://t.co/RXVF7n7mHR 🧵1/12
1
2
37
Looking forward to giving a talk this Friday @OpenAI with @zhilifeng on some of our privacy & memorization research + how it applies to production LLMs! We've been gaining momentum on detecting, quantifying & erasing memorization; excited to explore its real-world impact!
0
12
104