
Jiayi Geng
@JiayiiGeng
Followers
1K
Following
252
Media
7
Statuses
59
PhD @LTIatCMU | MSE @Princeton_nlp @PrincetonPLI @cocosci_lab @PrincetonCS. Working on Multi-agent / Cognitive science & LLMs
Princeton, NJ
Joined August 2022
Using LLMs to build AI scientists is all the rage now (e.g., Google’s AI co-scientist [1] and Sakana’s Fully Automated Scientist [2]), but how much do we understand about their core scientific abilities?.We know how LLMs can be vastly useful (solving complex math problems) yet
11
80
493
🔗 Sign up to participate: .🔗 Submit your poster:.
docs.google.com
Thanks for your interest in attending the CMU AI for science workshop, to be held on Sept 12, 2025. This form is to register your interest in participating as an attendee in order to estimate...
1
0
4
RT @JiahaoQiu99: 🚀 Just released: "A Survey of Self-Evolving Agents: On Path to Artificial Super Intelligence"!.We provide the first compre….
0
39
0
Check out this cool video (made by @theryanliu) for our #icml25 paper, "Mind Your Step (by Step): Chain-of-Thought can Reduce Performance on Tasks where Thinking Makes Humans Worse"🤗.
A short 📹 explainer video on how LLMs can overthink in humanlike ways 😲!. had a blast presenting this at #icml2025 🥳
0
0
12
In "Mind Your Step (by Step): Chain‑of‑Thought can Reduce Performance on Tasks where Thinking Makes Humans Worse", we connect human "overthinking" insights to LLM reasoning, offering a new lens on when thinking‑out‑loud backfires. 📄 Read the full paper:
arxiv.org
Chain-of-thought (CoT) prompting has become a widely used strategy for improving large language and multimodal model performance. However, it is still an open question under which settings CoT...
One of the better posters I saw today at #icml25 . This gets at the root of the problems we were thinking about when we conceived and wrote the CoT paper.
1
11
83
RT @gaurav_ghosal: 1/So much of privacy research is designing post-hoc methods to make models mem. free. It’s time we turn that around with….
0
23
0
🧐Check out our poster 11 am today @ West-320!.
Chain of thought can hurt LLM performance 🤖.Verbal (over)thinking can hurt human performance 😵💫. Are when/why they happen similar?. Come find out at our poster at West-320 ⏰11am tomorrow!. #ICML2025
0
2
13
RT @xiangyue96: People are racing to push math reasoning performance in #LLMs—but have we really asked why? The common assumption is that i….
0
128
0
RT @gneubig: What will software development look like in 2026?. With coding agents rapidly improving, dev roles may look quite different. M….
0
16
0
RT @ChengleiSi: Are AI scientists already better than human researchers?. We recruited 43 PhD students to spend 3 months executing research….
0
180
0
RT @EchoShao8899: 🚨 70 million US workers are about to face their biggest workplace transmission due to AI agents. But nobody asks them wha….
0
136
0
RT @AnthropicAI: New on the Anthropic Engineering blog: how we built Claude’s research capabilities using multiple agents working in parall….
anthropic.com
On the the engineering challenges and lessons learned from building Claude's Research system
0
721
0
RT @yikewang_: LLMs are helpful for scientific research — but will they continuously be helpful?. Introducing 🔍ScienceMeter: current knowle….
0
55
0
RT @ChengleiSi: This year, there have been various pieces of evidence that AI agents are starting to be able to conduct scientific research….
0
48
0
RT @JiahaoQiu99: The GAIA game is over, and Alita is the final answer. Alita takes the top spot in GAIA, outperforming OpenAI Deep Researc….
0
29
0
Check out more details from our paper: Work done with amazing collaborators Howard Chen (@__howardchen), Dilip Arumugam (@Dilip_Arumugam), and Tom Griffiths (@cocosci_lab)! 🎉. Special thanks Danqi Chen (@danqi_chen) and people from @PrincetonPLI.
arxiv.org
Using AI to create autonomous researchers has the potential to accelerate scientific discovery. A prerequisite for this vision is understanding how well an AI model can identify the underlying...
1
2
12