Anthony Peng
@RealAnthonyPeng
Followers
494
Following
526
Media
39
Statuses
213
CS PhD @GeorgiaTech | Intern @Meta, @IBMResearch, @intel | Outcomes are what count; donβt let good processes excuse bad results.
Atlanta
Joined January 2021
π Excited to be at #NeurIPS2025 (Dec 1β8)! If youβre into post-training, LLM safety, reasoning models, or agents, letβs connect π Iβm also presenting our new work: π‘οΈ Shape it Up! Restoring LLM Safety during Finetuning ShengYun Peng, Pin-Yu Chen, Jianfeng Chi, Seongmin Lee,
1
4
19
(4/n) In "Shape It Up", we show how LLM guard models can be used to monitor and mitigate distractions during fine-tuning to restore the safety of the fine-tuned models. Paper: https://t.co/uoyukOHdUL with @RealAnthonyPeng @jianfengchi Seongmin Lee, & Duen Horng Chau
1
2
2
Iβll be at NeurIPS in San Diego from Dec 1β7 and would love to meet both old and new friends π Feel free to DM if youβd like to chat! π¬ #NeurIPS2025 #AI #MachineLearning #AISafety #ReasoningModels #AIAgents
0
1
15
β¨ πππ―π ππ§ π’π§π―π’πππ πππ₯π€ ππ πππ πππ¬πππ«ππ‘! β¨ I recently spoke at @IBMResearch about sthe afety alignment of generative foundation models. Huge thanks to @pinyuchenTW for the invitation and the amazing discussions! ποΈ πππ₯π€: Safety Alignment of
2
3
11
Thank you for having me! I will talk about the safety alignment of generative foundation models tonight at Ploutos!
Breaking down how Large Reasoning Models can become more aligned by learning to override flawed thinking β a big step for robust AI agents. Featuring ShengYun βAnthonyβ Peng (@GeorgiaTech ) & @ceciletamura for @ploutosai π [ https://t.co/bRvZ3hkhat](https://t.co/bRvZ3hkhat)
0
4
4
I passed my PhD proposal this week and officially became a PhD candidate! π Feeling excited and thankful to everyone who has supported me along the way β especially my advisor, @PoloChau!
0
2
13
#EMNLP2025 is here, and check out our latest survey on πππ π’π§πππ«π©π«πππππ’π¨π§ Γ ππππππ² Interpretation Meets Safety: A Survey on Interpretation Methods and Tools for Improving LLM Safety π The first survey connecting LLM interpretation & safety π Covers ~70
2
5
13
New @AIatMeta paper shows LLMs behave more safely by training on flawed reasoning and learning to correct it. On tough tests it stays safe even when harmful reasoning is injected, reaching about 98%. Fixes a real weakness by training models to recover when early reasoning goes
5
9
24
Our paper is also available on HuggingFace. If you find it interesting, drop an upvote β and share your take β weβd love to discuss!
huggingface.co
0
3
2
π¨ New paper alert! π¨ Can you believe it? Flawed thinking helps reasoning models learn better! Injecting just a bit of flawed reasoning can collapse safety by 36% π± β but we teach large reasoning models to fight back πͺπ‘οΈ. Introducing RECAP π: an RL post-training method
3
21
75
Sharing our RL method on training LLMs to be resilient safety reasoners.
π¨ New paper alert! π¨ Can you believe it? Flawed thinking helps reasoning models learn better! Injecting just a bit of flawed reasoning can collapse safety by 36% π± β but we teach large reasoning models to fight back πͺπ‘οΈ. Introducing RECAP π: an RL post-training method
1
6
38
[1/N] Check out our new LLM reasoning work! The "aha moment" in Math can be elicited through RLVR, can we do the same for (safety) alignment in RLHF without much modification in the training algorithm. The answer is yes.
π¨ New paper alert! π¨ Can you believe it? Flawed thinking helps reasoning models learn better! Injecting just a bit of flawed reasoning can collapse safety by 36% π± β but we teach large reasoning models to fight back πͺπ‘οΈ. Introducing RECAP π: an RL post-training method
1
20
77
In philosophy, false premise can lead to correct conclusion, provided that valid arguments and deduction are used. We are excited to see that large reasoning models can achieve the same improvements in correctness and safety! Paper:
arxiv.org
Large reasoning models (LRMs) "think" by generating structured chain-of-thought (CoT) before producing a final answer, yet they still lack the ability to reason critically about safety alignment...
π¨ New paper alert! π¨ Can you believe it? Flawed thinking helps reasoning models learn better! Injecting just a bit of flawed reasoning can collapse safety by 36% π± β but we teach large reasoning models to fight back πͺπ‘οΈ. Introducing RECAP π: an RL post-training method
0
4
16
We demonstrate that RECAP yields persistent robustness even under adaptive attacks and fundamentally improves LRM reasoning dynamics by increasing the frequency of self-reflection.
1
1
7
RECAP simultaneously strengthens safety, helpfulness, and math reasoning capability, with theoretical analysis supporting its robustness.
1
0
5