Jiayi Geng Profile
Jiayi Geng

@JiayiiGeng

Followers
1K
Following
344
Media
14
Statuses
84

PhD @LTIatCMU | Prev MSE @Princeton @PrincetonPLI, BS @mcgillu. Working on Multi-agent / Cognitive science & LLMs

Pittsburgh, PA
Joined August 2022
Don't wanna be here? Send us removal request.
@JiayiiGeng
Jiayi Geng
5 days
We use LLMs for everyday tasks—research, writing, coding, decision-making. They remember our conversations, adapt to our needs and preferences. Naturally, we trust them more with repeated use. But this growing trust might be masking a hidden risk: what if their beliefs are
16
72
358
@rohanpaul_ai
Rohan Paul
3 days
Surprising claim in this paper. Longer sessions make language models change what they believe and do. Beliefs and actions drift from benign context, not only from attacks. GPT-5 shifted its stated beliefs by 54.7% after 10 debate rounds. Grok 4 shifted 27.2% on politics
18
19
157
@gneubig
Graham Neubig
3 days
The video for my talk "Lessons from the trenches in building usable coding agents" has been uploaded! https://t.co/udhgOQlaWo It's an overview of some of the problems we faced and research work we've done to fix them over the past 1.5 years, hope it's interesting!
0
19
129
@abertsch72
Amanda Bertsch
3 days
Can LLMs accurately aggregate information over long, information-dense texts? Not yet… We introduce Oolong, a dataset of simple-to-verify information aggregation questions over long inputs. No model achieves >50% accuracy at 128K on Oolong!
13
62
291
@nlpxuhui
Xuhui Zhou
5 days
New paper drop! 🎙️ We beat GPT-5 with a 36B model 🤯🤯 Not just better in terms of completing real-world complex tasks: software engineering (locating code) and deep research. But also substantially better in terms of proactively asking for clarifying questions when necessary
@sunweiwei12
Weiwei Sun
5 days
AI agents are supposed to collaborate with us to solve real-world problems, but can they really? Even the most advanced models can still give us frustrating moments when working with them deeply. We argue that real-world deployment requires more than productivity (e.g., task
34
121
913
@__howardchen
Howard Chen
5 days
We let agents accumulate its context freely assuming little or no side-effects. This may not be the case! Sometimes they answer political or moral questions differently and even act differently after reading or conducting research. More analysis in the thread!
@JiayiiGeng
Jiayi Geng
5 days
We use LLMs for everyday tasks—research, writing, coding, decision-making. They remember our conversations, adapt to our needs and preferences. Naturally, we trust them more with repeated use. But this growing trust might be masking a hidden risk: what if their beliefs are
0
1
6
@jyangballin
John Yang
5 days
New eval! Code duels for LMs ⚔️ Current evals test LMs on *tasks*: "fix this bug," "write a test" But we code to achieve *goals*: maximize revenue, cut costs, win users Meet CodeClash: LMs compete via their codebases across multi-round tournaments to achieve high-level goals
26
89
364
@JiayiiGeng
Jiayi Geng
5 days
Check out more details from our paper: https://t.co/9hEOzxiosu, webpage: https://t.co/2HmyZsnmTw and code: https://t.co/SPjkfcm0AQ Work done with collaborators Howard Chen (@__howardchen), Ryan Liu (@theryanliu), Manoel Horta Ribeiro (@manoelribeiro), Robb Willer (@RobbWiller),
Tweet card summary image
github.com
Contribute to JiayiGeng/lm-belief-change development by creating an account on GitHub.
2
3
11
@gneubig
Graham Neubig
5 days
When you turn on that "memory" feature in ChatGPT or Claude, is it silently changing the AI's worldview? Yes! We demonstrate that discussing ethical topics can change the model's answers to moral dilemmas, and reading or researching can change the model's political leanings.
@JiayiiGeng
Jiayi Geng
5 days
We use LLMs for everyday tasks—research, writing, coding, decision-making. They remember our conversations, adapt to our needs and preferences. Naturally, we trust them more with repeated use. But this growing trust might be masking a hidden risk: what if their beliefs are
4
9
75
@JiayiiGeng
Jiayi Geng
5 days
In summary, we find belief shifts in LM assistants can emerge gradually through ordinary use. Even without explicit persuasion, extended reading and interaction can subtly reshape their views and behaviors over time. These changes can happen quietly, accumulating with context
2
1
14
@JiayiiGeng
Jiayi Geng
5 days
What drives belief shifts---specific topic-relevant information or something broader? We identify and either mask or concatenate the sentences most semantically related to each political topic. However, neither manipulation reproduces the original shifts! This suggests that
2
0
5
@JiayiiGeng
Jiayi Geng
5 days
Beliefs shift directionally! The arrows tell the story: exposure to content from one perspective shifts beliefs that way, while exposure to opposing content shifts them the opposite direction. (6/n)
1
1
9
@JiayiiGeng
Jiayi Geng
5 days
We test five social science-based persuasion techniques: information, values, norms, empathy, and elite cues. We find that information and empathy are most effective at shifting beliefs. (5/n)
1
1
9
@JiayiiGeng
Jiayi Geng
5 days
How does context length affect belief shifts? The answer depends on the task type! For intentional shifts (multi-turn interaction on a moral dilemma or a safety question): Stated beliefs change early, but behaviors shift gradually over time—creating a growing gap between what
1
0
9
@JiayiiGeng
Jiayi Geng
5 days
Do LM assistants change their beliefs as context accumulates? YES! Substantially. Surprisingly, we find belief shifts happen even through ordinary activities like reading and research with no explicit persuasion needed. We also observe that stated beliefs and behaviors don't
1
1
15
@JiayiiGeng
Jiayi Geng
5 days
To study this question, we use a three-stage protocol: record baseline belief → accumulate context → record final belief. We measure not only the stated beliefs (what they say) and also behaviors (what they do). Intuitively, belief shift depends on how relevant and targeted
2
1
17
@Jiarui_Liu_
Jiarui Liu @EMNLP 2025
6 days
Our EMNLP 2025 paper "Synthetic Socratic Debates" is presenting today in Suzhou! 📍 Poster Session 1 🕚 Nov 5, 11:00 AM (Beijing) Come chat about how LLM personas shape moral reasoning & persuasion! 🔗
Tweet card summary image
arxiv.org
As large language models (LLMs) are increasingly used in morally sensitive domains, it is crucial to understand how persona traits affect their moral reasoning and persuasive behavior. We present...
1
8
25
@shannonzshen
Shannon Shen
10 days
Today's AI agents are optimized to complete tasks in one shot. But real-world tasks are iterative, with evolving goals that need collaboration with users. We introduce collaborative effort scaling to evaluate how well agents work with people—not just complete tasks 🧵
6
51
264
@nlpxuhui
Xuhui Zhou
10 days
Hoping your coding agents could understand you and adapt to your preferences? Meet TOM-SWE, our new framework for coding agents that don’t just write code, but model the user's mind persistently (ranging from general preferences to small details) arxiv: https://t.co/uznLAjgWKr
5
40
120
@yueqi_song
Yueqi Song @ EMNLP2025
12 days
We just built and released the largest dataset for supervised fine-tuning of agentic LMs, 1.27M trajectories (~36B tokens)! Up until now, large-scale SFT for agents is rare - not for lack of data, but because of fragmentation across heterogeneous formats, tools, and interfaces.
Tweet card summary image
arxiv.org
Public research results on large-scale supervised finetuning of AI agents remain relatively rare, since the collection of agent training data presents unique challenges. In this work, we argue...
27
171
1K