Yi Zeng 曾祎 @ICCV @EasonZeng623 X Profile

Yi Zeng 曾祎 @ICCV

@EasonZeng623

Followers

1K

Following

2K

Media

63

Statuses

579

probe to improve | Ph.D. @VTEngineering | Amazon Research Fellow | #AI_safety 🦺 #AI_security 🛡 | I deal with the dark side of machine learning.

https://t.co/26XaRqWCFr

Virginia, US

Joined August 2017

Don't wanna be here? Send us removal request.

Yi Zeng 曾祎 @ICCV

@EasonZeng623

2 years

Now you know there's another dude just discussed AI Safety and Security with both sides ;) #NeurIPS2023 [📸 With legendaries @ylecun and Yoshua Bengio]

1

5

119

Yi Zeng 曾祎 @ICCV

@EasonZeng623

5 days

🫂Dear Metamates impacted today—beyond safety, our foundation model team also has opening for native multimodal roles (text/video/audio, OCR, and more):

Yi Zeng 曾祎 @ICCV

@EasonZeng623

7 days

BTW, We’re growing a safer foundation model research stack at @tiktok_us —safety pretraining, RLHF/RLAIF, evals. 🚨Intern + FTE roles. 📨DMs open.

0

5

Yi Zeng 曾祎 @ICCV

@EasonZeng623

7 days

BTW, We’re growing a safer foundation model research stack at @tiktok_us —safety pretraining, RLHF/RLAIF, evals. 🚨Intern + FTE roles. 📨DMs open.

Yi Zeng 曾祎 @ICCV

@EasonZeng623

7 days

@ICCVConference vibe kicking in with @liang_weixin and the very shy @xiangyuqi_pton. Batch-norm the smiles, dropout the shyness👇

0

1

19

Yi Zeng 曾祎 @ICCV

@EasonZeng623

25 days

"I'm excellent, grandma" -- Sora 2 Pro > a viral and unhinged vine post

fofr

@fofrAI

25 days

We're all breakfast now > a viral and unhinged vine post Definitely unhinged.

0

3

Yi Zeng 曾祎 @ICCV

@EasonZeng623

27 days

Can someone fact-check this for me: why does every Sora 2 vid featuring @sama have the same Adidas sneakers ? Is it just me or is this… consistent? 😂

1

0

1

Keith Sakata, MD

@KeithSakata

3 months

[10/12] To make matters worse, soon AI agents will know you better than your friends. Will they give you uncomfortable truths? Or keep validating you so you’ll never leave?

20

235

5K

Cas (Stephen Casper)

@StephenLCasper

3 months

🧵 New paper from @AISecurityInst x @AiEleuther that I led with Kyle O’Brien: Open-weight LLM safety is both important & neglected. But we show that filtering dual-use knowledge from pre-training data improves tamper resistance *>10x* over post-training baselines.

7

41

200

Yi Zeng 曾祎 @ICCV

@EasonZeng623

3 months

gpt-oss is overfit on refusal safety — sky-high over-refusal, tanked helpfulness on my bench. gpt-5 swings the other way: max helpfulness, refusals softened with follow-ups even on harmful asks. What do you see in this pattern?

0

1

4

Shrey Kothari

@shreyk0

3 months

who's making these graphs

196

114

3K

Andrew Curran

@AndrewCurran_

3 months

'Update Federal procurement guidelines to ensure that the government only contracts with frontier large language model (LLM) developers who ensure that their systems are objective and free from top-down ideological bias.' there is an executive order on this arriving today.

5

10

60

Andrew Curran

@AndrewCurran_

3 months

The AI action plan has been released.

Andrew Curran

@AndrewCurran_

3 months

Tomorrow is the unveiling of the AI Action Plan.

68

166

1K

Mahavir

@Mahavir_Dabas18

4 months

🎉 Thrilled to be presenting my first paper at @icmlconf! "Just Enough Shifts: Mitigating Over-Refusal in Aligned Language Models with Targeted Representation Fine-Tuning" We introduce ACTOR—a lightweight, activation-based training method that reduces over-refusal without

arxiv.org

Safety alignment is crucial for large language models (LLMs) to resist malicious instructions but often results in over-refusals, where benign prompts are unnecessarily rejected, impairing user...

1

3

11

Weiyan Shi

@shi_weiyan

4 months

🔹 AI alignment really needs interdisciplinary work! 🔹 See my talk on "how to humanize AI to persuade them for jailbreaking":

FAR.AI

@farairesearch

4 months

📄 ACL'24 Outstanding & Best Social Impact Paper: https://t.co/Eygpfmoyjx 🎥 Full talk from Singapore Alignment Workshop:

0

8

44

Tiezhen WANG

@Xianbao_QIAN

4 months

I'll never forget this model as well as the relationship between pretrain, SFT and RLHF.

clem 🤗

@ClementDelangue

4 months

Baidu just released 23 models at the same time on @huggingface - from 0.3B to 424B parameters. Let’s go!

7

49

523

Xiuyu Li

@xiuyu_l

4 months

Sparsity can make your LoRA fine-tuning go brrr 💨 Announcing SparseLoRA (ICML 2025): up to 1.6-1.9x faster LLM fine-tuning (2.2x less FLOPs) via contextual sparsity, while maintaining performance on tasks like math, coding, chat, and ARC-AGI 🤯 🧵1/ https://t.co/ilZgkfj78J

5

58

208

Jason Weston

@jaseweston

4 months

🌉 Bridging Offline & Online RL for LLMs 🌉 📝: https://t.co/G12TS6Z84n New paper shows on verifiable & non-verifiable tasks: - Online DPO & GRPO give similar performance. - Semi-online (iterative) DPO with sync every s steps (more efficient!) works very well also. - Offline DPO

1

98

455

Dawn Song

@dawnsongtweets

4 months

1/ 🔥 AI agents are reaching a breakthrough moment in cybersecurity. In our latest work: 🔓 CyberGym: AI agents discovered 15 zero-days in major open-source projects 💰 BountyBench: AI agents solved real-world bug bounty tasks worth tens of thousands of dollars 🤖

28

154

533

Yi Zeng 曾祎 @ICCV

@EasonZeng623

6 months

AIR-Bench is a Spotlight @iclr_conf 2025! Catch our poster on Fri, Apr 26, 10 a.m.–12:30 p.m. SGT (Poster Session 5). Sadly, I won’t be there in person (visa woes, again), but the insights—and our incredible team—will be with you in Singapore. Go say hi 👋

Yi Zeng 曾祎 @ICCV

@EasonZeng623

1 year

🧵[1/5] Introducing AIR 2024: Unifying AI risk categorizations with a shared language to improve AI safety. W/ @kevin_klyman @andyz245 @YUYANG_UCLA @MinzhouP & guidance from @ruoxijia @dawnsongtweets @percyliang @uiuc_aisecure for kicking off my AI policy research journey 🏦.

0

3

21

Dawn Song

@dawnsongtweets

8 months

🚀 Really excited to launch #AgentX competition hosted by @BerkeleyRDI @UCBerkeley alongside our LLM Agents MOOC series (a global community of 22k+ learners & growing fast). Whether you're building the next disruptive AI startup or pushing the research frontier, AgentX is your

20

110

417

Josh Engels

@JoshAEngels

8 months

1/14: If sparse autoencoders work, they should give us interpretable classifiers that help with probing in difficult regimes (e.g. data scarcity). But we find that SAE probes consistently underperform! Our takeaway: mech interp should use stronger baselines to measure progress 🧵

12

60

521

Dylan Sam

@dylanjsam

8 months

Excited to share new work from my internship @GoogleAI ! Curious as to how we should measure the similarity between examples in pretraining datasets? We study the role of similarity in pretraining 1.7B parameter language models on the Pile. arxiv: https://t.co/iyS3Fxtx9a 1/🧵

5

42

170