
Kaiyu Yang
@KaiyuYang4
Followers
4K
Following
1K
Media
30
Statuses
272
Research Scientist at @Meta Fundamental AI Research (FAIR), New York. Previously: Postdoc @Caltech, PhD @PrincetonCS, Undergrad @Tsinghua_Uni.
New York, NY
Joined June 2019
š Excited to share that the Workshop on Mathematical Reasoning and AI (MATHāAI) will be at NeurIPSāÆ2025!.š
DecāÆ6 or 7 (TBD), 2025.š“ San Diego, California
7
27
171
Thanks to our dedicated teams of organizers: @HanSineng, @lupantech, @weixiong_1, @ericzelikman, @Yong18850571, @uniq_zz, Soonho Kong, @hhexiy, @dawnsongtweets, @prfsanjeevarora.
0
0
11
š We need reviewers to help maintain our scientific quality! If you're interested in reviewing MATHāAI submissions, please sign up here: Reviewers play a vital roleāthanks for your contributions!.
docs.google.com
The Workshop on Mathematical Reasoning and AI (MATH-AI) at NeurIPS 2025 aims to bring together diverse participants from a wide range of backgrounds, institutions, and disciplines to explore a...
0
0
9
āļø Submit your 4āpage, nonāarchival workshop papers to MATHāAI. š
Deadline (tentative): AugāÆ29, 2025 AoE.š Info & CFP: š Submission via OpenReview: All accepted papers will be presented as posters, with a few selected orals and.
openreview.net
Welcome to the OpenReview homepage for NeurIPS 2025 Workshop MATH-AI
0
1
10
āļø We have a stellar lineup of speakers:.* Swarat Chaudhuri (UT Austin & Google DeepMind) @swarat.* Weizhu Chen (Microsoft) @WeizhuChen.* Yejin Choi (Stanford & NVIDIA) @YejinChoinka.* Hannaneh Hajishirzi (UW & AI2).* Heng Ji (UIUC) @hengjinlp.* Chi Jin (Princeton) @chijinML.*
0
1
16
RT @WendaLi8: Lovely to see the impressive performance of the Seed Prover developed by the ByteDance Seed team at IMO 2025 ā achieving a siā¦.
leanprover.zulipchat.com
Browse the publicly accessible channels in Lean without logging in.
0
22
0
RT @AlexKontorovich: Another AI system, ByteDance's SeedProver solved 4 out of 6 IMO problems *with* Lean, and solved a fifth with extendedā¦.
0
49
0
RT @chijinML: While IMO is trending, our model leads on college-level math (Putnam Benchmark)ānearly doubling the problems solved by priorā¦.
0
21
0
RT @demishassabis: Official results are in - Gemini achieved gold-medal level in the International Mathematical Olympiad! š An advanced verā¦.
deepmind.google
Our advanced model officially achieved a gold-medal level performance on problems from the International Mathematical Olympiad (IMO), the worldās most prestigious competition for young...
0
765
0
RT @Dorialexander: SOTA on PutnamBench with a 32b model (and highly competitive 8b): Goedel team is not messing around. Unsurprisingly mosā¦.
0
5
0
RT @prfsanjeevarora: Formal math taking off at @PrincetonPLI ! New Goedel-Prover v2 8B model matches 2.5 month old Deepseek V2 prover 671Bā¦.
0
16
0
RT @chijinML: š Huge milestone from our Goedel-Prover team: weāve just released a new state-of-the-art model (8B & 32B) for automated theorā¦.
0
10
0
Our Goedel-Prover-V2 doubled the SOTA Pass@32 performance on PutnamBench with a 20x smaller model, making it the strongest open-source theorem prover to date!.
(1/4)šØ Introducing Goedel-Prover V2 šØ.š„š„š„ The strongest open-source theorem prover to date. š„ #1 on PutnamBench: Solves 64 problemsāwith far less compute. š§ New SOTA on MiniF2F:.* 32B model hits 90.4% at Pass@32, beating DeepSeek-Prover-V2-671Bās 82.4%. * 8B > 671B: Our 8B
0
14
88
RT @swarat: Passionate about frontier AI models, classical symbolic reasoning, and safe/secure software? Consider applying for this positioā¦.
job-boards.greenhouse.io
0
15
0
RT @dawnsongtweets: 1/ š„ AI agents are reaching a breakthrough moment in cybersecurity. In our latest work:. š CyberGym: AI agents discovā¦.
0
141
0
With LLMs increasingly used in software development, the bottleneck will move from writing code to reasoning about code (review, testing, debugging, and verification). Dynamically typed languages like Python are popular because they made code easy to write. However, the future.
1/š§µIntroducing VERINA: a high-quality benchmark for verifiable code generation. As LLMs are increasingly used to generate software, we need more than just working code--We need formal guarantees of correctness. VERINA offers a rigorous and modular framework for evaluating LLMs.
1
3
34