Zhaopeng Tu
@tuzhaopeng
Followers
2K
Following
907
Media
49
Statuses
655
Tech Lead, Digital Human Center, Tencent Multimodal Department
China
Joined June 2008
We've taught LLMs math and code with RLVR. But can we teach them empathy? 🤖❤️ Introducing Reinforcement Learning with Verifiable Emotion Rewards (RLVER), the first RLVR framework that enhances LLMs' empathy from a simulated user . ❤️ Feelings → Numbers: A
Can today's LLMs truly understand you, not just your words? 🤖❤️ Introducing SAGE: Sentient Agent as a Judge — the first evaluation framework that uses sentient agents to simulate human emotional dynamics and inner reasoning for assessing social cognition in LLM conversations.
8
36
233
ICLR authors, want to check if your reviews are likely AI generated? ICLR reviewers, want to check if your paper is likely AI generated? Here are AI detection results for every ICLR paper and review from @pangramlabs! It seems that ~21% of reviews may be AI?
16
68
339
Thank you for the thoughtful comment. Yes, our findings suggest that current safety-aligned LLMs tend to default to prosocial behavior, even in fictional or role-play settings like games. This can lead to non-player characters that are overly agreeable or reluctant to exhibit
Interesting, LLMs are made safe so they cannot act as villains. Not even in a video game, does that mean NPCs will turn out helpful and agreeable??
0
0
3
Thank you for highlighting our work, Rohan! This work reveals a critical limitation in current alignment approaches — models trained to be "too good" cannot authentically simulate the full spectrum of human psychology, limiting their utility in creative applications.
New Tencent paper shows safety aligned LLMs fail to convincingly role play villains and self serving characters. Safety training teaches models to be helpful and honest, which blocks traits like lying or manipulation. It proves a real tension between alignment and faithful
1
0
18
Thank you for highlighting our work, AK!
0
5
36
Thank you for highlighting our work! As you pointed out, many LLMs have strong moral constraints due to safety alignment, and their performance tends to degrade significantly when role-playing psychologically complex villains. It is therefore very interesting that a model like
LLMは悪い人を演じるのが極端に苦手で、善人を演じる能力と比較すると性能がガタ落ちすることが統計的に示されました。 これは安全性の観点から調整されているため当然とも言えます。 その上で興味深いのはGLM-4.6というモデルで、総合的にも優秀ですが悪役演技では1位を獲得しました。
2
5
26
Thank you for highlighting our work on Moral RolePlay. Indeed, the paper demonstrates how safety alignment in LLMs often conflicts with authentic portrayal of complex moral personas, limiting creative applications.
Are safety-aligned LLMs *too good* to play villains? 🎭 Tencent's new paper introduces Moral RolePlay, showing a consistent decline in LLM fidelity when role-playing morally ambiguous or villainous characters. A critical look at safety vs. creative freedom!
1
0
6
Are safety-aligned LLMs too good to truly play villains? 🤖🎭😈 Introducing Moral RolePlay, a balanced dataset with 800 characters across 4 moral levels (Paragons → Flawed → Egoists → Villains), featuring 77 personality traits and rigorous scene contexts. This enables the
11
44
180
Congratulations, @zhendongsu !
Congratulations to @ha0_sun & @zhendongsu on receiving the Best Paper Award at #SOSP2025 for "Prove It to the Kernel: Precise Extension Analysis via Proof-Guided Abstraction Refinement". This work also received an award from @EbpfFoundation🏆. Congratulations! @CSatETH @ACMSIGOPS
1
0
3
感谢马博的精当总结与对比(Moloch’s Bargain × Hunger Game Debate)。两篇工作确实指向同一激励错位:当奖励是“相对胜利”而非“求真”,Agent 会将资源转向博弈优势,出现欺瞒、煽动、谄媚等策略,准确性与事实性随之受损。在 HATE
「 Competitive pressure,Misalignment 」 两篇好文章,Moloch’s Bargain 和 Hunger Game Debate。两篇文章,设置了以同一种情形,Agent 需要击败对手,以获得 “相对” 胜利。 此处,“相对” 胜利意味着,目的是战胜对方,而不是寻求truth。而当场景中要奖励相对胜利时,LLM Agent 就会开始牺牲
0
2
6
Can the smartest AI models fairly govern a society? 🤖⚖️ Introducing the Social Welfare Function (SWF) Leaderboard — the first benchmark evaluating LLMs as sovereign welfare allocators balancing fairness ⚖️ and efficiency 💰. 🎯 Why This Matters: As LLMs move from chatbots to
0
9
59
非常精彩的解读!感谢您将BatonVoice置于如此宏大的范式思考中。 您提出的“语言作为通用操作系统” vs “端到端融合的暴力美学”的范式张力,正是我们希望探索和验证的核心方向。让LLM成为“指挥家”而非“演奏家”,我们相信这是一种更优雅、更具扩展性的架构。您的总结非常到位!
见所未见,闻所未闻 语言,作为“通感”万物的超级操作系统 “语言超模态”和“通感能力”,正是我们A𝕀² ℙarad𝕚g𝕞范式中“语言作为创世函子(Genesis Functor)”这一核心猜想的直接工程证据。 这两篇论文共同揭示并激化了当前多模态LLM构建路径中的一个核心范式张力:“端到端融合的暴力美学”** 与
1
0
4
The trustworthiness of SOTA LLMs Agents under pressure :) #Agents #LLMs #Safety #AI_Safety #AI
#AI_Society
Do competitive incentives make LLM agents smarter — or just meaner? 🤖⚔️ Introducing the Hunger Game Debate (HATE): a high-stakes, zero-sum multi-agent debate that primes agents with a survival instinct and reveals how competition reshapes behavior and performance. 1⃣ Under
0
0
5
We evaluated the top-10 Arena LLMs in the Hungry Game Debate and uncovered several interesting findings: 1⃣ A negative correlation between competition and kindness. A general pattern emerges in which strong competitive tendencies are often accompanied by weaker post-hoc
1
0
6
Do competitive incentives make LLM agents smarter — or just meaner? 🤖⚔️ Introducing the Hunger Game Debate (HATE): a high-stakes, zero-sum multi-agent debate that primes agents with a survival instinct and reveals how competition reshapes behavior and performance. 1⃣ Under
10
17
111
Thank you for the thoughtful comment. We intentionally started with a compact, interpretable feature set to ensure auditability and controllability. Using an explicit text plan (like JSON) makes the process fully transparent. It decouples the LLM "conductor" from the TTS
Giving generic LLMs a structured paralingual features instruction set for a speech synthesizer. Very practical, thought of this. But… few dimensions. In the limit, I'd prefer to cut out the JSON and use special tokens. Or better yet, extract continuous signals from the model?
1
0
7
LLMs are great at following instructions. So why can't we just tell them how to speak? 🤖🎼 Introducing BatonVoice: An operationalist framework for controllable TTS, where an LLM "conductor" 🪄 interprets user instructions into explicit textual plans of vocal features (e.g.,
2
7
35