Alex Robey @AlexRobey23 X Profile

Alex Robey

@AlexRobey23

Followers

1K

Following

12K

Media

104

Statuses

667

AI researcher. Current postdoc at @mldcmu, Ph.D. from @Penn, B.S. & B.A. from @Swarthmore, working with @GraySwanAI, formerly @GoogleAI, @Livermore_Lab.

https://t.co/WdFcAYM2Dk

Pittsburgh, PA

Joined July 2020

Don't wanna be here? Send us removal request.

Alex Robey

@AlexRobey23

1 year

Chatbots like ChatGPT can be jailbroken to output harmful text. But what about robots? Can AI-controlled robots be jailbroken to perform harmful actions in the real world? Our new paper finds that jailbreaking AI-controlled robots isn't just possible. It's alarmingly easy. 🧵

21

145

401

Yuda Song

@yus167

13 days

🤖 Robots rarely see the true world's state—they operate on partial, noisy visual observations. How should we design algorithms under this partial observability? Should we decide (end-to-end RL) or distill (from a privileged expert)? We study this trade-off in locomotion. 🧵(1/n)

2

38

133

Javier Rando

@javirandor

18 days

My first paper from @AnthropicAI! We show that the number of samples needed to backdoor an LLM stays constant as models scale.

Anthropic

@AnthropicAI

18 days

New research with the UK @AISecurityInst and the @turinginst: We found that just a few malicious documents can produce vulnerabilities in an LLM—regardless of the size of the model or its training data. Data-poisoning attacks might be more practical than previously believed.

6

20

198

Dylan Sam

@dylanjsam

18 days

Very interesting insights into understanding when and why synthetic data (although imperfect and biased) can boost the performance of statistical inference!! 📈📈

Emily Byun

@yewonbyun_

18 days

💡Can we trust synthetic data for statistical inference? We show that synthetic data (e.g. LLM simulations) can significantly improve the performance of inference tasks. The key intuition lies in the interactions between the moments of synthetic data and those of real data

0

4

13

Christina Baek

@_christinabaek

19 days

We're at #COLM2025 to present our work on building diverse reasoning models by weight ensembling. If you're curious about improving test-time scaling + theoretical limits, come talk to @xingyudang and @AdtRaghunathan at our poster session Poster #58 on Thursday 11 AM!

2

25

173

Lars Lindemann

@LarsLindemann2

1 month

🚨 Reminder: 45 days left to submit to L4DC 2026! 🚨 We encourage all researchers to submit their work to the Learning for Dynamics & Control Conference (L4DC 2026) which will take place at USC in Los Angeles between June 17–19, 2026. 🗓️ Paper submission deadline: Nov. 8, 2025

1

3

10

Aditi Raghunathan

@AdtRaghunathan

1 month

There’s been a lot of work on unlearning in LLMs, trying to erase memorization without hurting capabilities — but we haven’t seen much success. ❓What if unlearning is actually doomed from the start? 👇This thread explains why and how *memorization sinks* offer a new way forward.

6

38

175

Pratyush Maini

@pratyushmaini

1 month

One thing years of memorization research has made clear: unlearning is fundamentally hard. Neurons are polysemantic & concepts are massively distributed. There’s no clean 'delete'. We need architectures that are "unlearnable by design". Introducing, Memorization Sinks 🛁⬇️

Aditi Raghunathan

@AdtRaghunathan

1 month

There’s been a lot of work on unlearning in LLMs, trying to erase memorization without hurting capabilities — but we haven’t seen much success. ❓What if unlearning is actually doomed from the start? 👇This thread explains why and how *memorization sinks* offer a new way forward.

2

16

185

Sachin Goyal

@goyalsachin007

1 month

🚨 Super excited to finally share our Safety Pretraining work — along with all the artifacts (safe data, models, code)! In this thread 🧵, I’ll walk through our journey — the key intermediate observations and lessons, and how they helped shape our final pipeline.

Dylan Sam

@dylanjsam

1 month

🚨Excited to introduce a major development in building safer language models: Safety Pretraining! Instead of post-hoc alignment, we take a step back and embed safety directly into pretraining. 🧵(1/n)

3

13

67

Pratyush Maini

@pratyushmaini

1 month

@geteviapp @dylanjsam @goyalsachin007 @AlexRobey23 @yashsavani_ @yidingjiang @andyzou_jiaming @zacharylipton @zicokolter This is exactly what the second bar in the results shows! https://t.co/CGJABEz83I

Dylan Sam

@dylanjsam

1 month

A simple approach could be to filter out all harmful examples. However, this risks producing models that have no knowledge about sensitive topics. In fact, in our experiments, we observe that training only on a safe subset of data leads to worse models. 📉 (3/n)

0

2

5

Alex Robey

@AlexRobey23

1 month

Check out our approach to 𝐬𝐚𝐟𝐞𝐭𝐲 𝐩𝐫𝐞𝐭𝐫𝐚𝐢𝐧𝐢𝐧𝐠 , where we pretrain 2B models from scratch on filtered web data to be safe by construction 🚀

Dylan Sam

@dylanjsam

1 month

🚨Excited to introduce a major development in building safer language models: Safety Pretraining! Instead of post-hoc alignment, we take a step back and embed safety directly into pretraining. 🧵(1/n)

0

2

18

Yuda Song

@yus167

2 months

LLMs lose diversity after RL post-training, and this hurts test-time scaling & creativity. Why does this collapse happen, and how can we fix it? Our new work introduces: 🔍 RL as Sampling (analysis) 🗺️ Outcome-based Exploration (intervention) [1/n]

9

87

469

Sachin Goyal

@goyalsachin007

2 months

1/Excited to share the first in a series of my research updates on LLM pretraining🚀. Our new work shows *distilled pretraining*—increasingly used to train deployable models—has trade-offs: ✅ Boosts test-time scaling ⚠️ Weakens in-context learning ✨ Needs tailored data curation

5

65

330

Fazl Barez

@FazlBarez

2 months

Embodied AI isn’t just chatbots with arms! It’s a new frontier of risk. Our paper shows how badly current laws lag behind and what needs to change before deployment accelerates.

Jared Perlo

@_perloj

2 months

🚨 NEW PAPER 🚨: Embodied AI (EAI, such as self-driving cars, AI-powered drones, humanoid robots, etc.) is here, but our policies are dangerously behind. We analyzed the risks from these EAI systems and found massive gaps in governance. 🧵

0

2

8

Jakob Mökander

@jakobmokander

2 months

New pre-print! Embodied AI is the next wave of technological disruption. But what risks do EAI systems pose and how should policymakers respond? https://t.co/4XYs5CO1wr Coupled with excitement and opportunity, Embodied AI pose distinct risks – from physical harm to societal

1

2

7

Jared Perlo

@_perloj

2 months

A delight to work with my wonderful collaborators @AlexRobey23 , @FazlBarez , Luciano Floridi and @jakobmokander. All feedback welcome on this preprint! https://t.co/J6JagTCTA0

0

4

7

Alex Robey

@AlexRobey23

2 months

Interested? The full paper lays out a risk taxonomy, reviews existing governance frameworks, and offers concrete steps to close gaps. Led by the fantastic @_perloj along with @FazlBarez, Luciano Floridi, and @jakobmokander. Paper link:

arxiv.org

The field of embodied AI (EAI) is rapidly advancing. Unlike virtual AI, EAI systems can exist in, learn from, reason about, and act in the physical world. With recent advances in AI models and...

0

1

5

Alex Robey

@AlexRobey23

2 months

Recent AI policy—shaped by safety-related concerns—mostly targets virtual models (e.g., LLMs). But embodied AI introduces new risks, including the possibility of harm in the physical world. And current US/EU/UK frameworks rarely name or scope these embodied risks explicitly.

Alex Robey

@AlexRobey23

1 year

Chatbots like ChatGPT can be jailbroken to output harmful text. But what about robots? Can AI-controlled robots be jailbroken to perform harmful actions in the real world? Our new paper finds that jailbreaking AI-controlled robots isn't just possible. It's alarmingly easy. 🧵

1

2

Alex Robey

@AlexRobey23

2 months

Embodied AI (drones, humanoids, etc.) is advancing rapidly, but governance has not kept pace. Our new paper asks: What should future policy look like for AIs that can act autonomously in the physical world? See below for a short 🤖🧵:

Jared Perlo

@_perloj

2 months

🚨 NEW PAPER 🚨: Embodied AI (EAI, such as self-driving cars, AI-powered drones, humanoid robots, etc.) is here, but our policies are dangerously behind. We analyzed the risks from these EAI systems and found massive gaps in governance. 🧵

2

3

15

Stephanie Milani

@steph_milani

2 months

🌻 Excited to announce that I’ve moved to NYC to start as an Assistant Prof/Faculty Fellow at @nyuniversity! If you’re in the area, reach out & let’s chat! Would love coffee & tea recs as well 🍵

75

19

1K

Haimin Hu

@HaiminHu

2 months

Excited to share some life updates: I successfully defended my thesis, written under the guidance of my advisor Prof. Jaime Fernández Fisac (@jaime_fisac @EPrinceton). I'll join the Johns Hopkins Department of Computer Science (@JHUCompSci) next summer as an Assistant Professor.

5

11

135