ChengleiSi Profile Banner
CLS Profile
CLS

@ChengleiSi

Followers
5K
Following
24K
Media
41
Statuses
3K

PhDing @stanfordnlp | teaching language models to do research

Palo Alto, California
Joined August 2018
Don't wanna be here? Send us removal request.
@ChengleiSi
CLS
5 months
Are AI scientists already better than human researchers? We recruited 43 PhD students to spend 3 months executing research ideas proposed by an LLM agent vs human experts. Main finding: LLM ideas result in worse projects than human ideas.
12
196
634
@deepseek_ai
DeepSeek
12 hours
🏆 World-Leading Reasoning 🔹 V3.2: Balanced inference vs. length. Your daily driver at GPT-5 level performance. 🔹 V3.2-Speciale: Maxed-out reasoning capabilities. Rivals Gemini-3.0-Pro. 🥇 Gold-Medal Performance: V3.2-Speciale attains gold-level results in IMO, CMO, ICPC World
57
244
3K
@miahdogtags
Miahdogtags
4 months
I Never Dreamed I'd Grow Up To Be A Spoiled Wife Of A Grumpy Old Husband. Love this fun t-shirt - Get yours now!
13
29
234
@thomasfbloom
Thomas Bloom
1 day
As the owner/maintainer of the Erdős problems website, a thread with some comments on this solution to #124: 1) This is a nice proof, which was provided by the AI from the formal statement with no human involvement and then formalised in Lean. This is already impressive!
@vladtenev
Vlad Tenev
2 days
We are on the cusp of a profound change in the field of mathematics. Vibe proving is here. Aristotle from @HarmonicMath just proved Erdos Problem #124 in @leanprover, all by itself. This problem has been open for nearly 30 years since conjectured in the paper “Complete sequences
22
112
1K
@zhs05232838
Zhihong Shao
4 days
We just shared some thoughts and results on self-verifiable mathematical reasoning. The released model, DeepSeekMath-V2, is strong on IMO-ProofBench and competitions like IMO 2025 (5/6 problems) and Putnam 2024 (a near-perfect score of 118/120). Github: https://t.co/4dMEqWxXfU
28
78
673
@AkariAsai
Akari Asai
6 days
1/ Hiring PhD students at CMU SCS (LTI/MLD) for Fall 2026 (Deadline 12/10) 🎓 I work on open, reliable LMs: augmented LMs & agents (RAG, tool use, deep research), safety (hallucinations, copyright), and AI for science, code & multilinguality & open to bold new ideas! FAQ in 🧵
16
119
575
@RedChip
RedChip
3 days
🎥 Enlivex Therapeutics $ENLV: CEO Interview Highlights Breakthrough Osteoarthritis Data In a recent interview, CEO Oren Hershkovitz shared compelling Phase I/II results from Allocetra™ in age-related osteoarthritis. With a $7B+ global osteoarthritis market, no FDA-approved
0
2
6
@pratyusha_PS
Pratyusha Sharma
10 days
📢 Some big (& slightly belated) life updates! 1. I defended my PhD at MIT this summer! 🎓 2. I'm joining NYU as an Assistant Professor starting Fall 2026, with a joint appointment in Courant CS and the Center for Data Science. 🎉 🔬 My lab will focus on empirically studying
100
89
2K
@MinyangTian1
Minyang Tian ✈️ NeurIPS
10 days
Can LLMs help physicists break new ground in real frontier research? We introduce CritPt (Complex Research using Integrated Thinking - Physics Test, pronounced "Critical Point"): the first benchmark of unpublished, realistic research-level reasoning challenges broadly spanning
13
22
156
@EpochAIResearch
Epoch AI
11 days
Benchmarking data is dominated by a single “General Capability” dimension. Is this due to good generalization across tasks, or to developers pushing on all benchmarks at once? 🧵 with some analysis, including the discovery of a “Claudiness” dimension.
7
27
276
@finbarrtimbers
finbarr
11 days
The OlmoRL infrastructure was 4x faster than Olmo 2 and made it much cheaper to run experiments. Some of the changes: 1. continuous batching 2. in-flight updates 3. active sampling 4. many many improvements to our multi-threading code
4
15
178
@soldni
Luca Soldaini 🌯 NeurIPS 2025
11 days
We are releasing a LARGE new collection of science PDFs we linearized with olmOCR! great for our first long context model. It was fun to use synth data to boost long context–all using Olmo 2! Older bro helping younger sibiling 🥹
2
4
42
@BrianHie
Brian Hie
12 days
Today in @Nature, in work led by @aditimerch, we report the ability to prompt Evo to generate functional de novo genes. You shall know a gene by the company it keeps! 1/n
7
103
543
@johnhewtt
John Hewitt
12 days
Come do a PhD with me at Columbia! My lab tackles basic problems in alignment, interpretability, safety, and capabilities of language systems. If you love adventuring in model internals and behaviors---to understand and improve---let's do it together! pic: a run in central park
12
129
945
@tonyzzhao
Tony Zhao
12 days
Today, we present a step-change in robotic AI @sundayrobotics. Introducing ACT-1: A frontier robot foundation model trained on zero robot data. - Ultra long-horizon tasks - Zero-shot generalization - Advanced dexterity 🧵->
425
658
5K
@stanfordnlp
Stanford NLP Group
12 days
How Stanford researchers design human-focused AI systems: “AI products enter the real world very quickly, often without a rigorous understanding of their impact or the consequences of their use. We need to move forward with responsibility.” —@Diyi_Yang https://t.co/wO0c8LbPsK
3
10
85
@RulinShao
Rulin Shao ✈️ NeurIPS
13 days
🔥Thrilled to introduce DR Tulu-8B, an open long-form Deep Research model that matches OpenAI DR 💪Yes, just 8B! 🚀 The secret? We present Reinforcement Learning with Evolving Rubrics (RLER) for long-form non-verifiable DR tasks! Our rubrics: - co-evolve with the policy model -
7
116
536
@CaimingXiong
Caiming Xiong
14 days
🤖🧠 LLM agents are becoming adept at reasoning over complex codebases, yet they remain static, rarely learning from their own experience. We introduce SAGE (Self Abstraction from Grounded Experience), a framework that enables agents to reflect on past rollouts, distill
3
17
86
@tanmayxagarwal
Tanmay Agarwal
16 days
It's 2023 and you go up to the 2nd Floor in Stanford's Gates CS Building. If you turn right, first on your left you would see @chelseabfinn's lab through the glass, with @tonyzzhao working on his Aloha setup. If you keep going straight, you enter the bullpen for @StanfordSVL.
5
3
156
@yoavgo
(((ل()(ل() 'yoav))))👾
17 days
as LLM-based systems improve and produce "novel research papers" that are actually correct and properly written (not a very high bar, probably around the corner) I wonder if we will have a new category when discussing research(ers): "work that could have been done by an AI".
2
3
28
@AnthropicAI
Anthropic
19 days
New Anthropic research: Project Fetch. We asked two teams of Anthropic researchers to program a robot dog. Neither team had any robotics expertise—but we let only one team use Claude. How did they do?
80
200
2K
@micahgoldblum
Micah Goldblum
18 days
An LLM-generated paper is in the top 17% of ICLR submissions in terms of average reviewer score, having received two 8's. The paper has tons of BS jargon and hallucinated references. Fortunately, one reviewer actually looked at the paper and gave it a zero. 1/3
40
145
1K
@tomchen0
Tong Chen @ NeurIPS
18 days
OpenAI's blog ( https://t.co/Mu05PFfPXg) points out that today’s language models hallucinate because training and evaluation reward guessing instead of admitting uncertainty. This raises a natural question: can we reduce hallucination without hurting utility?🤔 On-policy RL with
25
123
669