
FENG CHEN
@FCHEN_AI
Followers
70
Following
15
Media
3
Statuses
13
PhD @Stanford, computational neuroscience and deep learning.
Joined July 2022
AI has always been GPU-first. But on-device AI should be NPU-first. Today we’re launching OmniNeural-4B — the world’s first NPU-aware multimodal model, natively understanding text, images, and audio. And introducing nexaML — a generative AI inference engine that runs models on
11
34
107
The best part of this job is seeing students graduate and launch their careers! Congrats to Feng Chen, Atsushi Yamamura, Tamra Nebabu, Linnie Wharton and Daniel Kunin. They are all going on to top positions across artificial intelligence, medicine, and physics. Proud of you!
1
1
91
Proud to be part of the team behind this new open-source SOTA formal math prover! 🚀 Achieving 72.95% on MiniF2F with simple BFS strategy. Our models are trained using expert iteration and DPO, pushing the boundaries of formal theorem proving. 📄 Paper:
arxiv.org
Recent advancements in large language models (LLMs) have spurred growing interest in automatic theorem proving using Lean4, where effective tree search methods are crucial for navigating the...
🚀 Excited to announce BFS-Prover, our state-of-the-art theorem proving system in Lean4! We've achieved 72.95% on the MiniF2F test, surpassing all previous systems including DeepSeek-Prover-v1.5, InternLM2.5-StepProver, and HunyuanProver 📈 🔥 Key innovations: - Simple
0
0
2
4/ We extend our algorithm to automated theorem proving and math QA with CoT. In theorem proving, our approach improves performance by controlling the exploitation and exploration tradeoff in proof trees. In CoT, where overconfidence is less severe, we also see performance gains.
0
1
3
3/ We propose directly optimizing for coverage in the fine-tuning loss with Direct Coverage Optimization (DCO). DCO attenuates gradients on high-confidence samples, regularizing away from overconfidence. We demonstrate superior accuracy frontiers over CE loss in MATH and MiniF2F.
1
1
2
1/ Our new paper: “Rethinking Fine-Tuning when Scaling Test-Time Compute: Limiting Confidence Improves Mathematical Reasoning” on how to change training to better exploit test-time compute! co-led by @AllanRaventos, w/ Nan Cheng, @SuryaGanguli & @ShaulDr
https://t.co/xM49OB6sk7
1
5
19
Join us at the ML for Multiscale Processes workshop at #ICLR2025 to hear from our three first amazing keynotes: Qianxiao Li https://t.co/ZQG6acjlwK Sergei Gukov https://t.co/2E5KjTpHxq Charlotte Bunne
0
2
1
Come check out our #NeurIPS2024 spotlight poster on feature learning tomorrow! 📍East Exhibit Hall A-C #2102 📅Thu 12 Dec 4:30 p.m. — 7:30 p.m. PST
🌟Announcing NeurIPS spotlight paper on the transition from lazy to rich🔦 We reveal through exact gradient flow dynamics how unbalanced initializations promote rapid feature learning co-led @AllanRaventos and @ClementineDomi6 @FCHEN_AI @klindt_david @SaxeLab @SuryaGanguli
0
7
49
Want to learn about SGD's implicit bias towards simpler subnetworks generated by permutation symmetry?! Come to our NeurIPS poster session tomorrow morning 10:45 - 12:45 Hall B1+B2 (level 1) #906
Our new preprint reveals how SGD biases neural nets towards vastly simpler subnets w/ superior generalization via stochastic collapse to invariant sets & explains why prolonged large learning rates help co-led w/ @FCHEN_AI @atsushi_y1230 & @SuryaGanguli
https://t.co/tJKWp1Neng
1
7
75
1/ Our new paper lead by @AllanRaventos @mansiege , @FCHEN_AI asks when in-context learning of regression can solve fundamentally *new* problems *not* seen during pre-training, and reveals it as an emergent capability arising from a phase transition...
arxiv.org
Pretrained transformers exhibit the remarkable ability of in-context learning (ICL): they can learn tasks from just a few examples provided in the prompt without updating any weights. This raises...
4
39
169
Excited to share our new work on how pretraining task diversity affects in-context learning.
Can in-context learning learn new tasks different from those in the pretraining data? Is this an emergent ability, i.e. does it arise from pretraining without being explicitly optimized for? How does this depend on pretraining task diversity? 🧵 1/ https://t.co/g118pWgAA9
0
0
4
Excited to share our new paper on how SGD biases towards simpler models via stochastic collapse to the invariant sets.
Our new preprint reveals how SGD biases neural nets towards vastly simpler subnets w/ superior generalization via stochastic collapse to invariant sets & explains why prolonged large learning rates help co-led w/ @FCHEN_AI @atsushi_y1230 & @SuryaGanguli
https://t.co/tJKWp1Neng
0
2
4