Alex Robey Profile
Alex Robey

@AlexRobey23

Followers
1K
Following
12K
Media
104
Statuses
667

AI researcher. Current postdoc at @mldcmu, Ph.D. from @Penn, B.S. & B.A. from @Swarthmore, working with @GraySwanAI, formerly @GoogleAI, @Livermore_Lab.

Pittsburgh, PA
Joined July 2020
Don't wanna be here? Send us removal request.
@AlexRobey23
Alex Robey
1 year
Chatbots like ChatGPT can be jailbroken to output harmful text. But what about robots? Can AI-controlled robots be jailbroken to perform harmful actions in the real world? Our new paper finds that jailbreaking AI-controlled robots isn't just possible. It's alarmingly easy. 🧵
21
145
401
@yus167
Yuda Song
13 days
🤖 Robots rarely see the true world's state—they operate on partial, noisy visual observations. How should we design algorithms under this partial observability? Should we decide (end-to-end RL) or distill (from a privileged expert)? We study this trade-off in locomotion. 🧵(1/n)
2
38
133
@javirandor
Javier Rando
18 days
My first paper from @AnthropicAI! We show that the number of samples needed to backdoor an LLM stays constant as models scale.
@AnthropicAI
Anthropic
18 days
New research with the UK @AISecurityInst and the @turinginst: We found that just a few malicious documents can produce vulnerabilities in an LLM—regardless of the size of the model or its training data. Data-poisoning attacks might be more practical than previously believed.
6
20
198
@dylanjsam
Dylan Sam
18 days
Very interesting insights into understanding when and why synthetic data (although imperfect and biased) can boost the performance of statistical inference!! 📈📈
@yewonbyun_
Emily Byun
18 days
💡Can we trust synthetic data for statistical inference? We show that synthetic data (e.g. LLM simulations) can significantly improve the performance of inference tasks. The key intuition lies in the interactions between the moments of synthetic data and those of real data
0
4
13
@_christinabaek
Christina Baek
19 days
We're at #COLM2025 to present our work on building diverse reasoning models by weight ensembling. If you're curious about improving test-time scaling + theoretical limits, come talk to @xingyudang and @AdtRaghunathan at our poster session Poster #58 on Thursday 11 AM!
2
25
173
@LarsLindemann2
Lars Lindemann
1 month
🚨 Reminder: 45 days left to submit to L4DC 2026! 🚨 We encourage all researchers to submit their work to the Learning for Dynamics & Control Conference (L4DC 2026) which will take place at USC in Los Angeles between June 17–19, 2026. 🗓️ Paper submission deadline: Nov. 8, 2025
1
3
10
@AdtRaghunathan
Aditi Raghunathan
1 month
There’s been a lot of work on unlearning in LLMs, trying to erase memorization without hurting capabilities — but we haven’t seen much success. ❓What if unlearning is actually doomed from the start? 👇This thread explains why and how *memorization sinks* offer a new way forward.
6
38
175
@pratyushmaini
Pratyush Maini
1 month
One thing years of memorization research has made clear: unlearning is fundamentally hard. Neurons are polysemantic & concepts are massively distributed. There’s no clean 'delete'. We need architectures that are "unlearnable by design". Introducing, Memorization Sinks 🛁⬇️
@AdtRaghunathan
Aditi Raghunathan
1 month
There’s been a lot of work on unlearning in LLMs, trying to erase memorization without hurting capabilities — but we haven’t seen much success. ❓What if unlearning is actually doomed from the start? 👇This thread explains why and how *memorization sinks* offer a new way forward.
2
16
185
@goyalsachin007
Sachin Goyal
1 month
🚨 Super excited to finally share our Safety Pretraining work — along with all the artifacts (safe data, models, code)! In this thread 🧵, I’ll walk through our journey — the key intermediate observations and lessons, and how they helped shape our final pipeline.
@dylanjsam
Dylan Sam
1 month
🚨Excited to introduce a major development in building safer language models: Safety Pretraining! Instead of post-hoc alignment, we take a step back and embed safety directly into pretraining. 🧵(1/n)
3
13
67
@pratyushmaini
Pratyush Maini
1 month
@dylanjsam
Dylan Sam
1 month
A simple approach could be to filter out all harmful examples. However, this risks producing models that have no knowledge about sensitive topics. In fact, in our experiments, we observe that training only on a safe subset of data leads to worse models. 📉 (3/n)
0
2
5
@AlexRobey23
Alex Robey
1 month
Check out our approach to 𝐬𝐚𝐟𝐞𝐭𝐲 𝐩𝐫𝐞𝐭𝐫𝐚𝐢𝐧𝐢𝐧𝐠 , where we pretrain 2B models from scratch on filtered web data to be safe by construction 🚀
@dylanjsam
Dylan Sam
1 month
🚨Excited to introduce a major development in building safer language models: Safety Pretraining! Instead of post-hoc alignment, we take a step back and embed safety directly into pretraining. 🧵(1/n)
0
2
18
@yus167
Yuda Song
2 months
LLMs lose diversity after RL post-training, and this hurts test-time scaling & creativity. Why does this collapse happen, and how can we fix it? Our new work introduces: 🔍 RL as Sampling (analysis) 🗺️ Outcome-based Exploration (intervention) [1/n]
9
87
469
@goyalsachin007
Sachin Goyal
2 months
1/Excited to share the first in a series of my research updates on LLM pretraining🚀. Our new work shows *distilled pretraining*—increasingly used to train deployable models—has trade-offs: ✅ Boosts test-time scaling ⚠️ Weakens in-context learning ✨ Needs tailored data curation
5
65
330
@FazlBarez
Fazl Barez
2 months
Embodied AI isn’t just chatbots with arms! It’s a new frontier of risk. Our paper shows how badly current laws lag behind and what needs to change before deployment accelerates.
@_perloj
Jared Perlo
2 months
🚨 NEW PAPER 🚨: Embodied AI (EAI, such as self-driving cars, AI-powered drones, humanoid robots, etc.) is here, but our policies are dangerously behind. We analyzed the risks from these EAI systems and found massive gaps in governance. 🧵
0
2
8
@jakobmokander
Jakob Mökander
2 months
New pre-print! Embodied AI is the next wave of technological disruption. But what risks do EAI systems pose and how should policymakers respond? https://t.co/4XYs5CO1wr Coupled with excitement and opportunity, Embodied AI pose distinct risks – from physical harm to societal
1
2
7
@_perloj
Jared Perlo
2 months
A delight to work with my wonderful collaborators @AlexRobey23 , @FazlBarez , Luciano Floridi and @jakobmokander. All feedback welcome on this preprint! https://t.co/J6JagTCTA0
0
4
7
@AlexRobey23
Alex Robey
2 months
Interested? The full paper lays out a risk taxonomy, reviews existing governance frameworks, and offers concrete steps to close gaps. Led by the fantastic @_perloj along with @FazlBarez, Luciano Floridi, and @jakobmokander. Paper link:
Tweet card summary image
arxiv.org
The field of embodied AI (EAI) is rapidly advancing. Unlike virtual AI, EAI systems can exist in, learn from, reason about, and act in the physical world. With recent advances in AI models and...
0
1
5
@AlexRobey23
Alex Robey
2 months
Recent AI policy—shaped by safety-related concerns—mostly targets virtual models (e.g., LLMs). But embodied AI introduces new risks, including the possibility of harm in the physical world. And current US/EU/UK frameworks rarely name or scope these embodied risks explicitly.
@AlexRobey23
Alex Robey
1 year
Chatbots like ChatGPT can be jailbroken to output harmful text. But what about robots? Can AI-controlled robots be jailbroken to perform harmful actions in the real world? Our new paper finds that jailbreaking AI-controlled robots isn't just possible. It's alarmingly easy. 🧵
1
1
2
@AlexRobey23
Alex Robey
2 months
Embodied AI (drones, humanoids, etc.) is advancing rapidly, but governance has not kept pace. Our new paper asks: What should future policy look like for AIs that can act autonomously in the physical world? See below for a short 🤖🧵:
@_perloj
Jared Perlo
2 months
🚨 NEW PAPER 🚨: Embodied AI (EAI, such as self-driving cars, AI-powered drones, humanoid robots, etc.) is here, but our policies are dangerously behind. We analyzed the risks from these EAI systems and found massive gaps in governance. 🧵
2
3
15
@steph_milani
Stephanie Milani
2 months
🌻 Excited to announce that I’ve moved to NYC to start as an Assistant Prof/Faculty Fellow at @nyuniversity! If you’re in the area, reach out & let’s chat! Would love coffee & tea recs as well 🍵
75
19
1K
@HaiminHu
Haimin Hu
2 months
Excited to share some life updates: I successfully defended my thesis, written under the guidance of my advisor Prof. Jaime Fernández Fisac (@jaime_fisac @EPrinceton). I'll join the Johns Hopkins Department of Computer Science (@JHUCompSci) next summer as an Assistant Professor.
5
11
135