james_y_zou Profile Banner
James Zou Profile
James Zou

@james_y_zou

Followers
18K
Following
2K
Media
290
Statuses
2K

@Stanford professor. Chan-Zuckerberg investigator. Sloan Fellow. Overton Prize. @togethercompute. AI for science + health.

Palo Alto, CA
Joined August 2016
Don't wanna be here? Send us removal request.
@james_y_zou
James Zou
6 days
Excited to share new works on LLMs, agents and AI for science at #NeurIPS this week!๐Ÿ‘‡ Thanks to my awesome students + collaborators. Look forward to meeting old and new friends in San Diego! Let me know if you want to chat!
4
13
124
@haotian_yeee
Haotian Ye @ NeurIPS25
2 days
๐Ÿค”Want a principled way to RL your diffusion model? Check Data-regularized Reinforcement Learning (DDRL)! Post-train @nvidia #Cosmos World Foundation models with a million GPU hours! ๐Ÿคฏ Novel formulation โžก๏ธ Theoretically integrates SFT into RL โžก๏ธ Robust to Reward Hacking ๐Ÿ›‘
2
56
166
@swangentr
Steven Wang
9 hours
This upcoming week centers on the Federal Reserveโ€™s final policy meeting of the year, with the interest rate decision due on Wednesday. Per CMEโ€™s FedWatch, the chance of a 25bps rate cut is now 86.2%. Beyond the rate decision itself, investors will be following Jerome Powellโ€™s
7
36
441
@StanfordHAI
Stanford HAI
2 days
โ€œAI needs to recognize and acknowledge false beliefs and misconceptions. Thatโ€™s still a big gap in current models, even the most recent ones,โ€ says @StanfordHAI faculty affiliate @james_y_zou on AI's current blind spots: https://t.co/U31R3PAVQm
9
10
61
@james_y_zou
James Zou
3 days
Can LLMs help us interpret genetic variants?๐Ÿงฌ Check out our #NeurIPS2025 paper on CGBench; @oq_35 is presenting it today!
@oq_35
Owen Queen
2 months
๐Ÿš€ Excited to share our new paper: CGBench โ€” Benchmarking Language Model Scientific Reasoning for Clinical Genetics Research Can AI truly understand scientific papers? We explore how LLMs interpret real biomedical literature โ€” not just multiple-choice questions.๐Ÿงต
1
6
51
@james_y_zou
James Zou
4 days
There's tremendous interest in multi-agent systems now, but how to optimize such system is a big challenge. #Sirius is a powerful framework that enables teams of multiple AI agents to self-improve ๐Ÿ’ฏ Check out @WanjiaZhao1203's #NeurIPS2025 poster + paper!
@WanjiaZhao1203
Wanjia Zhaoโœˆ๏ธ NeurIPS
10 months
Introducing #SIRIUS๐ŸŒŸ: A self-improving multi-agent LLM framework that learns from successful interactions and refines failed trajectories, enhancing college-level reasoning and competitive negotiations. ๐Ÿ“œPreprint: https://t.co/xthe4kDiAD ๐Ÿ’ปcode: https://t.co/jZuIg02OHc 1/N
7
25
163
@uaustinorg
University of Austin (UATX)
2 days
You haven't actually graduated until you've paid off your student loans.
7
6
48
@WanjiaZhao1203
Wanjia Zhaoโœˆ๏ธ NeurIPS
5 days
I am at #NeurIPS2025 from 12/2 to 12/7! Looking forward to meeting old and new friends! Come check out our poster: ๐Ÿ—“๏ธWed, Dec 3, 11AM โ€“ 2PM ๐Ÿ“Exhibit Hall C,D,E #5406 ๐Ÿ—“๏ธFri, Dec 5, 11AM โ€“ 2PM ๐Ÿ“Exhibit Hall C,D,E #1712 ๐Ÿ—“๏ธSat, Dec 6, 8:00 AM โ€“ 5:00 PM๐Ÿ“Upper Level Ballroom 6A
2
5
65
@EricDSun
Eric Sun
4 days
Can we predict the spatial effects of single-cell perturbations? Excited to share our preprint introducing SpatialProp, which computationally propagates single-cell transcriptomic perturbations across tissues, along with key frameworks for evaluating spatial perturbation models.
4
24
139
@qizhengz_alex
Qizheng Zhang โœˆ๏ธ NeurIPS 2025
5 days
๐Ÿš€ Introducing Agentic Context Engineering (ACE) --- a framework for self-improving language models through continuously evolving contexts (not weights). ๐Ÿ“ˆ High-Performing: +10.6% on agent tasks, +8.6% on finance โšก Ultra-Efficient: โˆ’86.9% latency, โˆ’83.6% dollar cost ๐Ÿ› 
23
115
752
@IEEESpectrum
IEEE Spectrum
6 days
As AI takes on agent roles in critical fields, reasoning failures raise risks. New studies from @james_y_zou of @StanfordMed and @ylqzd2011 of @HKUniversity show how reasoning goes off the rails.
Tweet card summary image
spectrum.ieee.org
As AI takes on agent roles, flawed reasoning raises risks
0
7
13
@james_y_zou
James Zou
12 days
Learning to learn by #LLM feedback, by #LLM feedback๐Ÿคฏ Many AI systems are optimized by LM feedback (eg TextGrad, DSPy, etc)๐Ÿ”ƒ. Our #neurip2025 paper introduces metaTextGrad: a powerful way to optimize all these LM optimizers โžก๏ธ better agents. ๐Ÿงต
2
11
71
@DuyHMNguyen1
Duy H. M. Nguyen
10 days
โœจ ๐“๐ก๐ซ๐ข๐ฅ๐ฅ๐ž๐ ๐ญ๐จ ๐ฌ๐ก๐š๐ซ๐ž ๐ญ๐ก๐š๐ญ ๐จ๐ฎ๐ซ ๐ฉ๐š๐ฉ๐ž๐ซ ๐„๐ฑ๐†๐ซ๐š-๐Œ๐ž๐ [1] ๐ก๐š๐ฌ ๐›๐ž๐ž๐ง ๐š๐œ๐œ๐ž๐ฉ๐ญ๐ž๐ ๐ญ๐จ NeurIPS 2025! (one of three other ones accepted this year ๐Ÿš€ ) โœจ Over the past year, my collaborators and I have been exploring a fundamental limitation of
2
2
8
@askalphaxiv
alphaXiv
12 days
AI is transforming scientific discovery at incredible speed ๐Ÿš€ Join us at NeurIPS for our AI4Science Panel with Rafael Gรณmez-Bombarelli, @jeffclune @james_y_zou as they discuss what the future of AI-enabled science might look like. Link to signup below ๐Ÿ‘‡
3
12
77
@james_y_zou
James Zou
12 days
3/ Amazing work by @Kevin_GuoweiXu and @mertyuksekgonul leading this project๐Ÿ™‡ Check out @Kevin_GuoweiXu's excellent thread for all the goods!
@Kevin_GuoweiXu
Guowei Xu @ NeurIPS 2025
13 days
Introducing #metaTextGrad๐ŸŒŸ: a meta-optimization framework built on #TextGrad , designed to improve existing LLM optimizers by aligning them more closely with specific tasks. ๐Ÿ“ฐ NeurIPS 2025 paper: https://t.co/M4Wj7TVIy4 ๐Ÿง‘โ€๐Ÿ’ปCode: https://t.co/9E0M1VrG35 ๐Ÿ“š Slides:
0
2
7
@james_y_zou
James Zou
12 days
2/ Existing LM optimizers are broad and generic. #metaTextGrad automatically adapts them to specific tasks, greatly improving performance and efficiency. ๐Ÿ“ฐ #NeurIPS2025 paper: https://t.co/DvluDROBdm ๐Ÿง‘โ€๐Ÿ’ป Code: https://t.co/mtkb3HGmNg ๐Ÿ“– Slides: https://t.co/it94UrzIYf
1
3
12
@james_y_zou
James Zou
12 days
Learning to learn by #LLM feedback, by #LLM feedback๐Ÿคฏ Many AI systems are optimized by LM feedback (eg TextGrad, DSPy, etc)๐Ÿ”ƒ. Our #neurip2025 paper introduces metaTextGrad: a powerful way to optimize all these LM optimizers โžก๏ธ better agents. ๐Ÿงต
2
11
71
@WanjiaZhao1203
Wanjia Zhaoโœˆ๏ธ NeurIPS
3 months
Accepted at #NeurIPS2025 ๐ŸŽ‰ Look forward to meeting everyone in SD!
@WanjiaZhao1203
Wanjia Zhaoโœˆ๏ธ NeurIPS
10 months
Introducing #SIRIUS๐ŸŒŸ: A self-improving multi-agent LLM framework that learns from successful interactions and refines failed trajectories, enhancing college-level reasoning and competitive negotiations. ๐Ÿ“œPreprint: https://t.co/xthe4kDiAD ๐Ÿ’ปcode: https://t.co/jZuIg02OHc 1/N
7
19
184
@Kevin_GuoweiXu
Guowei Xu @ NeurIPS 2025
13 days
Introducing #metaTextGrad๐ŸŒŸ: a meta-optimization framework built on #TextGrad , designed to improve existing LLM optimizers by aligning them more closely with specific tasks. ๐Ÿ“ฐ NeurIPS 2025 paper: https://t.co/M4Wj7TVIy4 ๐Ÿง‘โ€๐Ÿ’ปCode: https://t.co/9E0M1VrG35 ๐Ÿ“š Slides:
1
5
34
@james_y_zou
James Zou
17 days
โšก๏ธSolving inequality proofs with LLMs is accepted as a #neurips2025 spotlight paper! Mathematical analysis often involves deriving bounds or inequalities. Here we investigate using LLMs to derive tight bounds.
@lupantech
Pan Lu @NeurIPS 2025
6 months
Do LLMs truly understand math proofs, or just guess? ๐Ÿค”Our new study on #IneqMath dives deep into Olympiad-level inequality proofs & reveals a critical gap: LLMs are often good at finding answers, but struggle with rigorous, sound proofs. โžก๏ธ https://t.co/h5f8Qv8Xlv To tackle
1
12
72
@james_y_zou
James Zou
24 days
Causal DAG is a really neat approach to generate high-quality reasoning process reward at scale, without relying on LLM judge! ๐Ÿ’ฏ Great job @WanjiaZhao1203 @AquaHorseM @ShiJingzhe41415 w/ awesome collaborators๐Ÿ‘
@WanjiaZhao1203
Wanjia Zhaoโœˆ๏ธ NeurIPS
24 days
1/N Introducing๐ŸŒˆPRISM-Physics, a process-level, rule-based benchmark for complex physics reasoning. Each solution is modeled as a Directed Acyclic Graph of formulas, capturing causal relations between steps. A rule-based symbolic equivalence checker ensures consistent evaluation
3
13
65