Jiayi Geng Profile
Jiayi Geng

@JiayiiGeng

Followers
1K
Following
252
Media
7
Statuses
59

PhD @LTIatCMU | MSE @Princeton_nlp @PrincetonPLI @cocosci_lab @PrincetonCS. Working on Multi-agent / Cognitive science & LLMs

Princeton, NJ
Joined August 2022
Don't wanna be here? Send us removal request.
@JiayiiGeng
Jiayi Geng
3 months
Using LLMs to build AI scientists is all the rage now (e.g., Google’s AI co-scientist [1] and Sakana’s Fully Automated Scientist [2]), but how much do we understand about their core scientific abilities?.We know how LLMs can be vastly useful (solving complex math problems) yet
Tweet media one
11
80
493
@grok
Grok
25 days
Generate videos in just a few seconds. Try Grok Imagine, free for a limited time.
547
668
4K
@JiayiiGeng
Jiayi Geng
9 days
📢 We're thrilled to announce the CMU AI for Science Workshop on Sept 12 at CUC-MPW! . Featuring an amazing lineup of speakers: .- Akari Asai (AI2/CMU) .- Gabe Gomes (CMU) .- Chenglei Si (Stanford).- Keyon Vafa (Harvard) . Join us on campus, submit your poster & register here:.
1
13
127
@JiayiiGeng
Jiayi Geng
1 month
RT @JiahaoQiu99: 🚀 Just released: "A Survey of Self-Evolving Agents: On Path to Artificial Super Intelligence"!.We provide the first compre….
0
39
0
@JiayiiGeng
Jiayi Geng
1 month
Check out this cool video (made by @theryanliu) for our #icml25 paper, "Mind Your Step (by Step): Chain-of-Thought can Reduce Performance on Tasks where Thinking Makes Humans Worse"🤗.
@theryanliu
Ryan Liu @ ICML, CogSci
1 month
A short 📹 explainer video on how LLMs can overthink in humanlike ways 😲!. had a blast presenting this at #icml2025 🥳
0
0
12
@JiayiiGeng
Jiayi Geng
2 months
In "Mind Your Step (by Step): Chain‑of‑Thought can Reduce Performance on Tasks where Thinking Makes Humans Worse", we connect human "overthinking" insights to LLM reasoning, offering a new lens on when thinking‑out‑loud backfires. 📄 Read the full paper:
Tweet card summary image
arxiv.org
Chain-of-thought (CoT) prompting has become a widely used strategy for improving large language and multimodal model performance. However, it is still an open question under which settings CoT...
@edchi
Ed H. Chi
2 months
One of the better posters I saw today at #icml25 . This gets at the root of the problems we were thinking about when we conceived and wrote the CoT paper.
Tweet media one
1
11
83
@JiayiiGeng
Jiayi Geng
2 months
RT @gaurav_ghosal: 1/So much of privacy research is designing post-hoc methods to make models mem. free. It’s time we turn that around with….
0
23
0
@JiayiiGeng
Jiayi Geng
2 months
🧐Check out our poster 11 am today @ West-320!.
@theryanliu
Ryan Liu @ ICML, CogSci
2 months
Chain of thought can hurt LLM performance 🤖.Verbal (over)thinking can hurt human performance 😵‍💫. Are when/why they happen similar?. Come find out at our poster at West-320 ⏰11am tomorrow!. #ICML2025
Tweet media one
0
2
13
@JiayiiGeng
Jiayi Geng
2 months
RT @xiangyue96: People are racing to push math reasoning performance in #LLMs—but have we really asked why? The common assumption is that i….
0
128
0
@JiayiiGeng
Jiayi Geng
2 months
RT @gneubig: What will software development look like in 2026?. With coding agents rapidly improving, dev roles may look quite different. M….
0
16
0
@JiayiiGeng
Jiayi Geng
2 months
RT @ChengleiSi: Are AI scientists already better than human researchers?. We recruited 43 PhD students to spend 3 months executing research….
0
180
0
@JiayiiGeng
Jiayi Geng
3 months
I'm thrilled to share that I've moved to Pittsburgh and joined NeuLab at CMU as a research intern this summer, advised by @gneubig! I'll also start my PhD @LTIatCMU this fall. Feel free to reach out if you're interested in chatting about multi-agent systems, LLMs for scientific.
12
13
369
@JiayiiGeng
Jiayi Geng
3 months
RT @EchoShao8899: 🚨 70 million US workers are about to face their biggest workplace transmission due to AI agents. But nobody asks them wha….
0
136
0
@JiayiiGeng
Jiayi Geng
3 months
RT @AnthropicAI: New on the Anthropic Engineering blog: how we built Claude’s research capabilities using multiple agents working in parall….
Tweet card summary image
anthropic.com
On the the engineering challenges and lessons learned from building Claude's Research system
0
721
0
@JiayiiGeng
Jiayi Geng
3 months
RT @yikewang_: LLMs are helpful for scientific research — but will they continuously be helpful?. Introducing 🔍ScienceMeter: current knowle….
0
55
0
@JiayiiGeng
Jiayi Geng
3 months
RT @ChengleiSi: This year, there have been various pieces of evidence that AI agents are starting to be able to conduct scientific research….
0
48
0
@JiayiiGeng
Jiayi Geng
3 months
RT @JiahaoQiu99: The GAIA game is over, and Alita is the final answer. Alita takes the top spot in GAIA, outperforming OpenAI Deep Researc….
0
29
0
@JiayiiGeng
Jiayi Geng
3 months
[1] [2] [3] (10/10).
1
3
4
@JiayiiGeng
Jiayi Geng
3 months
Check out more details from our paper: Work done with amazing collaborators Howard Chen (@__howardchen), Dilip Arumugam (@Dilip_Arumugam), and Tom Griffiths (@cocosci_lab)! 🎉. Special thanks Danqi Chen (@danqi_chen) and people from @PrincetonPLI.
Tweet card summary image
arxiv.org
Using AI to create autonomous researchers has the potential to accelerate scientific discovery. A prerequisite for this vision is understanding how well an AI model can identify the underlying...
1
2
12
@JiayiiGeng
Jiayi Geng
3 months
In summary, we show that LLMs still struggle to fully utilize passive observations to form hypotheses and allowing active intervention mitigates some of the failure modes. By evaluating their ability to reverse-engineer black-box systems, we offer a principled and controlled.
1
0
8
@JiayiiGeng
Jiayi Geng
3 months
So, what makes active intervention so effective? It allows the LLM to strategically design its queries to test and iteratively refine its hypotheses about the black-box system. (7/n)
Tweet media one
1
0
3