
Runjin Chen
@RunjinChen
Followers
502
Following
4
Media
1
Statuses
11
Research Fellow @AnthropicAI | PH.D. student @UTAustin @VITAGroupUT | Previously BS/MS @sjtu1896
Joined August 2023
New Anthropic Research: Persona Vectors. We can:.1. Monitor how a model’s personality is changing during a conversation, or over training.2. Mitigate undesirable persona shifts during development or prevent during training. 3. Identify training data that leads to shift.
New Anthropic research: Persona vectors. Language models sometimes go haywire and slip into weird and unsettling personas. Why? In a new paper, we find “persona vectors"—neural activity patterns controlling traits like evil, sycophancy, or hallucination.
8
24
221
RT @EthanJPerez: We’re hiring someone to run the Anthropic Fellows Program!. Our research collaborations have led to some of our best safet….
0
27
0
RT @mlpowered: In which the gang (@RunjinChen, @andyarditi, @Jack_W_Lindsey ):. - identifies vectors for bad personas (evil, sycophancy, ha….
0
9
0
RT @AnthropicAI: New Anthropic research: Persona vectors. Language models sometimes go haywire and slip into weird and unsettling personas….
0
939
0
RT @VictorKaiWang1: Customizing Your LLMs in seconds using prompts🥳!.Excited to share our latest work with @HPCAILab, @VITAGroupUT, @k_schu….
0
75
0