Jiayi Geng @JiayiiGeng X Profile

Jiayi Geng

@JiayiiGeng

Followers

1K

Following

344

Media

14

Statuses

84

PhD @LTIatCMU | Prev MSE @Princeton @PrincetonPLI, BS @mcgillu. Working on Multi-agent / Cognitive science & LLMs

https://t.co/FzAdJcM8ma

Pittsburgh, PA

Joined August 2022

Don't wanna be here? Send us removal request.

Jiayi Geng

@JiayiiGeng

5 days

We use LLMs for everyday tasks—research, writing, coding, decision-making. They remember our conversations, adapt to our needs and preferences. Naturally, we trust them more with repeated use. But this growing trust might be masking a hidden risk: what if their beliefs are

16

72

358

Rohan Paul

@rohanpaul_ai

3 days

Surprising claim in this paper. Longer sessions make language models change what they believe and do. Beliefs and actions drift from benign context, not only from attacks. GPT-5 shifted its stated beliefs by 54.7% after 10 debate rounds. Grok 4 shifted 27.2% on politics

18

19

157

Graham Neubig

@gneubig

3 days

The video for my talk "Lessons from the trenches in building usable coding agents" has been uploaded! https://t.co/udhgOQlaWo It's an overview of some of the problems we faced and research work we've done to fix them over the past 1.5 years, hope it's interesting!

0

19

129

Amanda Bertsch

@abertsch72

3 days

Can LLMs accurately aggregate information over long, information-dense texts? Not yet… We introduce Oolong, a dataset of simple-to-verify information aggregation questions over long inputs. No model achieves >50% accuracy at 128K on Oolong!

13

62

291

Xuhui Zhou

@nlpxuhui

5 days

New paper drop! 🎙️ We beat GPT-5 with a 36B model 🤯🤯 Not just better in terms of completing real-world complex tasks: software engineering (locating code) and deep research. But also substantially better in terms of proactively asking for clarifying questions when necessary

Weiwei Sun

@sunweiwei12

5 days

AI agents are supposed to collaborate with us to solve real-world problems, but can they really? Even the most advanced models can still give us frustrating moments when working with them deeply. We argue that real-world deployment requires more than productivity (e.g., task

34

121

913

Howard Chen

@__howardchen

5 days

We let agents accumulate its context freely assuming little or no side-effects. This may not be the case! Sometimes they answer political or moral questions differently and even act differently after reading or conducting research. More analysis in the thread!

Jiayi Geng

@JiayiiGeng

5 days

We use LLMs for everyday tasks—research, writing, coding, decision-making. They remember our conversations, adapt to our needs and preferences. Naturally, we trust them more with repeated use. But this growing trust might be masking a hidden risk: what if their beliefs are

0

1

6

John Yang

@jyangballin

5 days

New eval! Code duels for LMs ⚔️ Current evals test LMs on *tasks*: "fix this bug," "write a test" But we code to achieve *goals*: maximize revenue, cut costs, win users Meet CodeClash: LMs compete via their codebases across multi-round tournaments to achieve high-level goals

26

89

364

Jiayi Geng

@JiayiiGeng

5 days

Check out more details from our paper: https://t.co/9hEOzxiosu, webpage: https://t.co/2HmyZsnmTw and code: https://t.co/SPjkfcm0AQ Work done with collaborators Howard Chen (@__howardchen), Ryan Liu (@theryanliu), Manoel Horta Ribeiro (@manoelribeiro), Robb Willer (@RobbWiller),

github.com

Contribute to JiayiGeng/lm-belief-change development by creating an account on GitHub.

2

3

11

Graham Neubig

@gneubig

5 days

When you turn on that "memory" feature in ChatGPT or Claude, is it silently changing the AI's worldview? Yes! We demonstrate that discussing ethical topics can change the model's answers to moral dilemmas, and reading or researching can change the model's political leanings.

Jiayi Geng

@JiayiiGeng

5 days

We use LLMs for everyday tasks—research, writing, coding, decision-making. They remember our conversations, adapt to our needs and preferences. Naturally, we trust them more with repeated use. But this growing trust might be masking a hidden risk: what if their beliefs are

4

9

75

Jiayi Geng

@JiayiiGeng

5 days

In summary, we find belief shifts in LM assistants can emerge gradually through ordinary use. Even without explicit persuasion, extended reading and interaction can subtly reshape their views and behaviors over time. These changes can happen quietly, accumulating with context

2

1

14

Jiayi Geng

@JiayiiGeng

5 days

What drives belief shifts---specific topic-relevant information or something broader? We identify and either mask or concatenate the sentences most semantically related to each political topic. However, neither manipulation reproduces the original shifts! This suggests that

2

0

5

Jiayi Geng

@JiayiiGeng

5 days

Beliefs shift directionally! The arrows tell the story: exposure to content from one perspective shifts beliefs that way, while exposure to opposing content shifts them the opposite direction. (6/n)

1

9

Jiayi Geng

@JiayiiGeng

5 days

We test five social science-based persuasion techniques: information, values, norms, empathy, and elite cues. We find that information and empathy are most effective at shifting beliefs. (5/n)

1

9

Jiayi Geng

@JiayiiGeng

5 days

How does context length affect belief shifts? The answer depends on the task type! For intentional shifts (multi-turn interaction on a moral dilemma or a safety question): Stated beliefs change early, but behaviors shift gradually over time—creating a growing gap between what

1

0

9

Jiayi Geng

@JiayiiGeng

5 days

Do LM assistants change their beliefs as context accumulates? YES! Substantially. Surprisingly, we find belief shifts happen even through ordinary activities like reading and research with no explicit persuasion needed. We also observe that stated beliefs and behaviors don't

1

15

Jiayi Geng

@JiayiiGeng

5 days

To study this question, we use a three-stage protocol: record baseline belief → accumulate context → record final belief. We measure not only the stated beliefs (what they say) and also behaviors (what they do). Intuitively, belief shift depends on how relevant and targeted

2

1

17

Jiarui Liu @EMNLP 2025

@Jiarui_Liu_

6 days

Our EMNLP 2025 paper "Synthetic Socratic Debates" is presenting today in Suzhou! 📍 Poster Session 1 🕚 Nov 5, 11:00 AM (Beijing) Come chat about how LLM personas shape moral reasoning & persuasion! 🔗

arxiv.org

As large language models (LLMs) are increasingly used in morally sensitive domains, it is crucial to understand how persona traits affect their moral reasoning and persuasive behavior. We present...

1

8

25

Shannon Shen

@shannonzshen

10 days

Today's AI agents are optimized to complete tasks in one shot. But real-world tasks are iterative, with evolving goals that need collaboration with users. We introduce collaborative effort scaling to evaluate how well agents work with people—not just complete tasks 🧵

6

51

264

Xuhui Zhou

@nlpxuhui

10 days

Hoping your coding agents could understand you and adapt to your preferences? Meet TOM-SWE, our new framework for coding agents that don’t just write code, but model the user's mind persistently (ranging from general preferences to small details) arxiv: https://t.co/uznLAjgWKr

5

40

120

Yueqi Song @ EMNLP2025

@yueqi_song

12 days

We just built and released the largest dataset for supervised fine-tuning of agentic LMs, 1.27M trajectories (~36B tokens)! Up until now, large-scale SFT for agents is rare - not for lack of data, but because of fragmentation across heterogeneous formats, tools, and interfaces.

arxiv.org

Public research results on large-scale supervised finetuning of AI agents remain relatively rare, since the collection of agent training data presents unique challenges. In this work, we argue...

27

171

1K