Alex Robey
@AlexRobey23
Followers
1K
Following
12K
Media
104
Statuses
667
AI researcher. Current postdoc at @mldcmu, Ph.D. from @Penn, B.S. & B.A. from @Swarthmore, working with @GraySwanAI, formerly @GoogleAI, @Livermore_Lab.
Pittsburgh, PA
Joined July 2020
Chatbots like ChatGPT can be jailbroken to output harmful text. But what about robots? Can AI-controlled robots be jailbroken to perform harmful actions in the real world? Our new paper finds that jailbreaking AI-controlled robots isn't just possible. It's alarmingly easy. 🧵
21
145
401
🤖 Robots rarely see the true world's state—they operate on partial, noisy visual observations. How should we design algorithms under this partial observability? Should we decide (end-to-end RL) or distill (from a privileged expert)? We study this trade-off in locomotion. 🧵(1/n)
2
38
133
My first paper from @AnthropicAI! We show that the number of samples needed to backdoor an LLM stays constant as models scale.
New research with the UK @AISecurityInst and the @turinginst: We found that just a few malicious documents can produce vulnerabilities in an LLM—regardless of the size of the model or its training data. Data-poisoning attacks might be more practical than previously believed.
6
20
198
Very interesting insights into understanding when and why synthetic data (although imperfect and biased) can boost the performance of statistical inference!! 📈📈
💡Can we trust synthetic data for statistical inference? We show that synthetic data (e.g. LLM simulations) can significantly improve the performance of inference tasks. The key intuition lies in the interactions between the moments of synthetic data and those of real data
0
4
13
We're at #COLM2025 to present our work on building diverse reasoning models by weight ensembling. If you're curious about improving test-time scaling + theoretical limits, come talk to @xingyudang and @AdtRaghunathan at our poster session Poster #58 on Thursday 11 AM!
2
25
173
🚨 Reminder: 45 days left to submit to L4DC 2026! 🚨 We encourage all researchers to submit their work to the Learning for Dynamics & Control Conference (L4DC 2026) which will take place at USC in Los Angeles between June 17–19, 2026. 🗓️ Paper submission deadline: Nov. 8, 2025
1
3
10
There’s been a lot of work on unlearning in LLMs, trying to erase memorization without hurting capabilities — but we haven’t seen much success. ❓What if unlearning is actually doomed from the start? 👇This thread explains why and how *memorization sinks* offer a new way forward.
6
38
175
One thing years of memorization research has made clear: unlearning is fundamentally hard. Neurons are polysemantic & concepts are massively distributed. There’s no clean 'delete'. We need architectures that are "unlearnable by design". Introducing, Memorization Sinks 🛁⬇️
There’s been a lot of work on unlearning in LLMs, trying to erase memorization without hurting capabilities — but we haven’t seen much success. ❓What if unlearning is actually doomed from the start? 👇This thread explains why and how *memorization sinks* offer a new way forward.
2
16
185
🚨 Super excited to finally share our Safety Pretraining work — along with all the artifacts (safe data, models, code)! In this thread 🧵, I’ll walk through our journey — the key intermediate observations and lessons, and how they helped shape our final pipeline.
🚨Excited to introduce a major development in building safer language models: Safety Pretraining! Instead of post-hoc alignment, we take a step back and embed safety directly into pretraining. 🧵(1/n)
3
13
67
@geteviapp @dylanjsam @goyalsachin007 @AlexRobey23 @yashsavani_ @yidingjiang @andyzou_jiaming @zacharylipton @zicokolter This is exactly what the second bar in the results shows! https://t.co/CGJABEz83I
A simple approach could be to filter out all harmful examples. However, this risks producing models that have no knowledge about sensitive topics. In fact, in our experiments, we observe that training only on a safe subset of data leads to worse models. 📉 (3/n)
0
2
5
Check out our approach to 𝐬𝐚𝐟𝐞𝐭𝐲 𝐩𝐫𝐞𝐭𝐫𝐚𝐢𝐧𝐢𝐧𝐠 , where we pretrain 2B models from scratch on filtered web data to be safe by construction 🚀
🚨Excited to introduce a major development in building safer language models: Safety Pretraining! Instead of post-hoc alignment, we take a step back and embed safety directly into pretraining. 🧵(1/n)
0
2
18
LLMs lose diversity after RL post-training, and this hurts test-time scaling & creativity. Why does this collapse happen, and how can we fix it? Our new work introduces: 🔍 RL as Sampling (analysis) 🗺️ Outcome-based Exploration (intervention) [1/n]
9
87
469
1/Excited to share the first in a series of my research updates on LLM pretraining🚀. Our new work shows *distilled pretraining*—increasingly used to train deployable models—has trade-offs: ✅ Boosts test-time scaling ⚠️ Weakens in-context learning ✨ Needs tailored data curation
5
65
330
Embodied AI isn’t just chatbots with arms! It’s a new frontier of risk. Our paper shows how badly current laws lag behind and what needs to change before deployment accelerates.
🚨 NEW PAPER 🚨: Embodied AI (EAI, such as self-driving cars, AI-powered drones, humanoid robots, etc.) is here, but our policies are dangerously behind. We analyzed the risks from these EAI systems and found massive gaps in governance. 🧵
0
2
8
New pre-print! Embodied AI is the next wave of technological disruption. But what risks do EAI systems pose and how should policymakers respond? https://t.co/4XYs5CO1wr Coupled with excitement and opportunity, Embodied AI pose distinct risks – from physical harm to societal
1
2
7
A delight to work with my wonderful collaborators @AlexRobey23 , @FazlBarez , Luciano Floridi and @jakobmokander. All feedback welcome on this preprint! https://t.co/J6JagTCTA0
0
4
7
Interested? The full paper lays out a risk taxonomy, reviews existing governance frameworks, and offers concrete steps to close gaps. Led by the fantastic @_perloj along with @FazlBarez, Luciano Floridi, and @jakobmokander. Paper link:
arxiv.org
The field of embodied AI (EAI) is rapidly advancing. Unlike virtual AI, EAI systems can exist in, learn from, reason about, and act in the physical world. With recent advances in AI models and...
0
1
5
Recent AI policy—shaped by safety-related concerns—mostly targets virtual models (e.g., LLMs). But embodied AI introduces new risks, including the possibility of harm in the physical world. And current US/EU/UK frameworks rarely name or scope these embodied risks explicitly.
Chatbots like ChatGPT can be jailbroken to output harmful text. But what about robots? Can AI-controlled robots be jailbroken to perform harmful actions in the real world? Our new paper finds that jailbreaking AI-controlled robots isn't just possible. It's alarmingly easy. 🧵
1
1
2
Embodied AI (drones, humanoids, etc.) is advancing rapidly, but governance has not kept pace. Our new paper asks: What should future policy look like for AIs that can act autonomously in the physical world? See below for a short 🤖🧵:
🚨 NEW PAPER 🚨: Embodied AI (EAI, such as self-driving cars, AI-powered drones, humanoid robots, etc.) is here, but our policies are dangerously behind. We analyzed the risks from these EAI systems and found massive gaps in governance. 🧵
2
3
15
🌻 Excited to announce that I’ve moved to NYC to start as an Assistant Prof/Faculty Fellow at @nyuniversity! If you’re in the area, reach out & let’s chat! Would love coffee & tea recs as well 🍵
75
19
1K
Excited to share some life updates: I successfully defended my thesis, written under the guidance of my advisor Prof. Jaime Fernández Fisac (@jaime_fisac @EPrinceton). I'll join the Johns Hopkins Department of Computer Science (@JHUCompSci) next summer as an Assistant Professor.
5
11
135