Jiayi Geng Profile
Jiayi Geng

@JiayiiGeng

Followers
994
Following
200
Media
7
Statuses
49

Incoming CS PhD @LTIatCMU | MSE @Princeton_nlp @PrincetonPLI @cocosci_lab @PrincetonCS. Working on Multi-agent / Cognitive science & LLMs

Princeton, NJ
Joined August 2022
Don't wanna be here? Send us removal request.
@JiayiiGeng
Jiayi Geng
2 months
Using LLMs to build AI scientists is all the rage now (e.g., Google’s AI co-scientist [1] and Sakana’s Fully Automated Scientist [2]), but how much do we understand about their core scientific abilities?.We know how LLMs can be vastly useful (solving complex math problems) yet
Tweet media one
11
78
489
@JiayiiGeng
Jiayi Geng
12 days
RT @xiangyue96: People are racing to push math reasoning performance in #LLMs—but have we really asked why? The common assumption is that i….
0
121
0
@JiayiiGeng
Jiayi Geng
15 days
RT @gneubig: What will software development look like in 2026?. With coding agents rapidly improving, dev roles may look quite different. M….
0
16
0
@JiayiiGeng
Jiayi Geng
15 days
RT @ChengleiSi: Are AI scientists already better than human researchers?. We recruited 43 PhD students to spend 3 months executing research….
0
165
0
@JiayiiGeng
Jiayi Geng
27 days
I'm thrilled to share that I've moved to Pittsburgh and joined NeuLab at CMU as a research intern this summer, advised by @gneubig! I'll also start my PhD @LTIatCMU this fall. Feel free to reach out if you're interested in chatting about multi-agent systems, LLMs for scientific.
11
13
369
@JiayiiGeng
Jiayi Geng
1 month
RT @EchoShao8899: 🚨 70 million US workers are about to face their biggest workplace transmission due to AI agents. But nobody asks them wha….
0
130
0
@JiayiiGeng
Jiayi Geng
1 month
RT @AnthropicAI: New on the Anthropic Engineering blog: how we built Claude’s research capabilities using multiple agents working in parall….
0
730
0
@JiayiiGeng
Jiayi Geng
1 month
RT @yikewang_: LLMs are helpful for scientific research — but will they continuously be helpful?. Introducing 🔍ScienceMeter: current knowle….
0
54
0
@JiayiiGeng
Jiayi Geng
2 months
RT @ChengleiSi: This year, there have been various pieces of evidence that AI agents are starting to be able to conduct scientific research….
0
46
0
@JiayiiGeng
Jiayi Geng
2 months
RT @JiahaoQiu99: The GAIA game is over, and Alita is the final answer. Alita takes the top spot in GAIA, outperforming OpenAI Deep Researc….
0
29
0
@JiayiiGeng
Jiayi Geng
2 months
[1] [2] [3] (10/10).
1
3
4
@JiayiiGeng
Jiayi Geng
2 months
Check out more details from our paper: Work done with amazing collaborators Howard Chen (@__howardchen), Dilip Arumugam (@Dilip_Arumugam), and Tom Griffiths (@cocosci_lab)! 🎉. Special thanks Danqi Chen (@danqi_chen) and people from @PrincetonPLI.
1
2
12
@JiayiiGeng
Jiayi Geng
2 months
In summary, we show that LLMs still struggle to fully utilize passive observations to form hypotheses and allowing active intervention mitigates some of the failure modes. By evaluating their ability to reverse-engineer black-box systems, we offer a principled and controlled.
1
0
8
@JiayiiGeng
Jiayi Geng
2 months
So, what makes active intervention so effective? It allows the LLM to strategically design its queries to test and iteratively refine its hypotheses about the black-box system. (7/n)
Tweet media one
1
0
3
@JiayiiGeng
Jiayi Geng
2 months
Active intervention effectively mitigates two common failure modes across all three black-box systems we studied:.1) Overcomplication – LLM tends to construct overly-complex hypotheses;.2) Overlooking – LLM neglects observations and draws overly-generic conclusions without
Tweet media one
1
0
9
@JiayiiGeng
Jiayi Geng
2 months
🤔Can LLMs successfully transfer their experiment data and findings to other LLMs?.The answer is not quite! Transferred interventions consistently underperform compared to LLMs conducting their own active interventions. (5/n)
Tweet media one
1
1
6
@JiayiiGeng
Jiayi Geng
2 months
🤔Why do interventions help? Do they result in more informative observations, or is it the process of generating the interventions itself that matters?-- Inspired by the passive-yoked design from human learning studies [3], we find that active learning outperforms passive
Tweet media one
1
0
9
@JiayiiGeng
Jiayi Geng
2 months
🤔How well can LLMs infer the internal mechanisms of black box systems from passive observations? . We looked at three kinds of black box systems inspired by cognitive studies:.1) list-mapping programs.2) rules of formal languages.3) parameters of math equations . Our findings
Tweet media one
1
2
13
@JiayiiGeng
Jiayi Geng
2 months
Doing science requires several skills:.1) Performing inductive reasoning based on passively observed data;.2) Actively interacting with a system to collect informative data and reduce uncertainty about its internal mechanisms;.3) Communicating the results. Based on these
Tweet media one
1
1
13
@JiayiiGeng
Jiayi Geng
3 months
RT @yueqi_song: Humans can perform complex reasoning without relying on specific domain knowledge, but can multimodal models truly do that….
0
43
0