Anthony Peng @RealAnthonyPeng X Profile

Anthony Peng

@RealAnthonyPeng

Followers

494

Following

526

Media

39

Statuses

213

CS PhD @GeorgiaTech | Intern @Meta, @IBMResearch, @intel | Outcomes are what count; don’t let good processes excuse bad results.

https://t.co/uP943SD25l

Atlanta

Joined January 2021

Don't wanna be here? Send us removal request.

Anthony Peng

@RealAnthonyPeng

10 days

🌟 Excited to be at #NeurIPS2025 (Dec 1–8)! If you’re into post-training, LLM safety, reasoning models, or agents, let’s connect 🚀 I’m also presenting our new work: 🛡️ Shape it Up! Restoring LLM Safety during Finetuning ShengYun Peng, Pin-Yu Chen, Jianfeng Chi, Seongmin Lee,

1

4

19

Pin-Yu Chen

@pinyuchenTW

3 days

(4/n) In "Shape It Up", we show how LLM guard models can be used to monitor and mitigate distractions during fine-tuning to restore the safety of the fine-tuned models. Paper: https://t.co/uoyukOHdUL with @RealAnthonyPeng @jianfengchi Seongmin Lee, & Duen Horng Chau

1

2

Anthony Peng

@RealAnthonyPeng

3 days

I’ll be at NeurIPS in San Diego from Dec 1–7 and would love to meet both old and new friends 😊 Feel free to DM if you’d like to chat! 💬 #NeurIPS2025 #AI #MachineLearning #AISafety #ReasoningModels #AIAgents

0

1

15

Anthony Peng

@RealAnthonyPeng

11 days

RECAP: https://t.co/cgDdSfWl3c STAR-DSS:

arxiv.org

Finetuning large language models (LLMs) enables user-specific customization but introduces critical safety risks: even a few harmful examples can compromise safety alignment. A common mitigation...

0

2

Anthony Peng

@RealAnthonyPeng

11 days

✨ 𝐆𝐚𝐯𝐞 𝐚𝐧 𝐢𝐧𝐯𝐢𝐭𝐞𝐝 𝐭𝐚𝐥𝐤 𝐚𝐭 𝐈𝐁𝐌 𝐑𝐞𝐬𝐞𝐚𝐫𝐜𝐡! ✨ I recently spoke at @IBMResearch about sthe afety alignment of generative foundation models. Huge thanks to @pinyuchenTW for the invitation and the amazing discussions! 🎙️ 𝐓𝐚𝐥𝐤: Safety Alignment of

2

3

11

Anthony Peng

@RealAnthonyPeng

20 days

Thank you for having me! I will talk about the safety alignment of generative foundation models tonight at Ploutos!

Cecile Tamura

@ceciletamura

21 days

Breaking down how Large Reasoning Models can become more aligned by learning to override flawed thinking — a big step for robust AI agents. Featuring ShengYun “Anthony” Peng (@GeorgiaTech ) & @ceciletamura for @ploutosai 🔗 [ https://t.co/bRvZ3hkhat](https://t.co/bRvZ3hkhat)

0

4

Anthony Peng

@RealAnthonyPeng

21 days

I passed my PhD proposal this week and officially became a PhD candidate! 🎉 Feeling excited and thankful to everyone who has supported me along the way — especially my advisor, @PoloChau!

0

2

13

Anthony Peng

@RealAnthonyPeng

1 month

📄 Read the paper:

0

1

3

Anthony Peng

@RealAnthonyPeng

1 month

#EMNLP2025 is here, and check out our latest survey on 𝐋𝐋𝐌 𝐢𝐧𝐭𝐞𝐫𝐩𝐫𝐞𝐭𝐚𝐭𝐢𝐨𝐧 × 𝐒𝐚𝐟𝐞𝐭𝐲 Interpretation Meets Safety: A Survey on Interpretation Methods and Tools for Improving LLM Safety 🌟 The first survey connecting LLM interpretation & safety 🌟 Covers ~70

2

5

13

Anthony Peng

@RealAnthonyPeng

1 month

No one is secure in today’s job market :-(

0

6

Rohan Paul

@rohanpaul_ai

2 months

New @AIatMeta paper shows LLMs behave more safely by training on flawed reasoning and learning to correct it. On tough tests it stays safe even when harmful reasoning is injected, reaching about 98%. Fixes a real weakness by training models to recover when early reasoning goes

5

9

24

Anthony Peng

@RealAnthonyPeng

2 months

Our paper is also available on HuggingFace. If you find it interesting, drop an upvote ⭐ and share your take — we’d love to discuss!

huggingface.co

0

3

2

Anthony Peng

@RealAnthonyPeng

2 months

🚨 New paper alert! 🚨 Can you believe it? Flawed thinking helps reasoning models learn better! Injecting just a bit of flawed reasoning can collapse safety by 36% 😱 — but we teach large reasoning models to fight back 💪🛡️. Introducing RECAP 🔄: an RL post-training method

3

21

75

Haozhu Wang

@haozhu_wang

2 months

Sharing our RL method on training LLMs to be resilient safety reasoners.

Anthony Peng

@RealAnthonyPeng

2 months

🚨 New paper alert! 🚨 Can you believe it? Flawed thinking helps reasoning models learn better! Injecting just a bit of flawed reasoning can collapse safety by 36% 😱 — but we teach large reasoning models to fight back 💪🛡️. Introducing RECAP 🔄: an RL post-training method

1

6

38

Jianfeng Chi

@jianfengchi

2 months

[1/N] Check out our new LLM reasoning work! The "aha moment" in Math can be elicited through RLVR, can we do the same for (safety) alignment in RLHF without much modification in the training algorithm. The answer is yes.

Anthony Peng

@RealAnthonyPeng

2 months

🚨 New paper alert! 🚨 Can you believe it? Flawed thinking helps reasoning models learn better! Injecting just a bit of flawed reasoning can collapse safety by 36% 😱 — but we teach large reasoning models to fight back 💪🛡️. Introducing RECAP 🔄: an RL post-training method

1

20

77

Pin-Yu Chen

@pinyuchenTW

2 months

In philosophy, false premise can lead to correct conclusion, provided that valid arguments and deduction are used. We are excited to see that large reasoning models can achieve the same improvements in correctness and safety! Paper:

arxiv.org

Large reasoning models (LRMs) "think" by generating structured chain-of-thought (CoT) before producing a final answer, yet they still lack the ability to reason critically about safety alignment...

Anthony Peng

@RealAnthonyPeng

2 months

🚨 New paper alert! 🚨 Can you believe it? Flawed thinking helps reasoning models learn better! Injecting just a bit of flawed reasoning can collapse safety by 36% 😱 — but we teach large reasoning models to fight back 💪🛡️. Introducing RECAP 🔄: an RL post-training method

0

4

16

Anthony Peng

@RealAnthonyPeng

2 months

Paper:

arxiv.org

Large reasoning models (LRMs) "think" by generating structured chain-of-thought (CoT) before producing a final answer, yet they still lack the ability to reason critically about safety alignment...

1

2

6

Anthony Peng

@RealAnthonyPeng

2 months

We demonstrate that RECAP yields persistent robustness even under adaptive attacks and fundamentally improves LRM reasoning dynamics by increasing the frequency of self-reflection.

1

7

Anthony Peng

@RealAnthonyPeng

2 months

RECAP simultaneously strengthens safety, helpfulness, and math reasoning capability, with theoretical analysis supporting its robustness.

1

0

5