FENG CHEN Profile
FENG CHEN

@FCHEN_AI

Followers
70
Following
15
Media
3
Statuses
13

PhD @Stanford, computational neuroscience and deep learning.

Joined July 2022
Don't wanna be here? Send us removal request.
@nexa_ai
NEXA AI
22 days
AI has always been GPU-first. But on-device AI should be NPU-first. Today we’re launching OmniNeural-4B — the world’s first NPU-aware multimodal model, natively understanding text, images, and audio. And introducing nexaML — a generative AI inference engine that runs models on
11
34
107
@SuryaGanguli
Surya Ganguli
3 months
The best part of this job is seeing students graduate and launch their careers! Congrats to Feng Chen, Atsushi Yamamura, Tamra Nebabu, Linnie Wharton and Daniel Kunin. They are all going on to top positions across artificial intelligence, medicine, and physics. Proud of you!
Tweet media one
Tweet media two
1
1
91
@FCHEN_AI
FENG CHEN
7 months
Proud to be part of the team behind this new open-source SOTA formal math prover! 🚀 Achieving 72.95% on MiniF2F with simple BFS strategy. Our models are trained using expert iteration and DPO, pushing the boundaries of formal theorem proving. 📄 Paper:
Tweet card summary image
arxiv.org
Recent advancements in large language models (LLMs) have spurred growing interest in automatic theorem proving using Lean4, where effective tree search methods are crucial for navigating the...
@RanXinByteDance
Ran Xin
7 months
🚀 Excited to announce BFS-Prover, our state-of-the-art theorem proving system in Lean4! We've achieved 72.95% on the MiniF2F test, surpassing all previous systems including DeepSeek-Prover-v1.5, InternLM2.5-StepProver, and HunyuanProver 📈 🔥 Key innovations: - Simple
0
0
2
@FCHEN_AI
FENG CHEN
7 months
4/ We extend our algorithm to automated theorem proving and math QA with CoT. In theorem proving, our approach improves performance by controlling the exploitation and exploration tradeoff in proof trees. In CoT, where overconfidence is less severe, we also see performance gains.
Tweet media one
Tweet media two
0
1
3
@FCHEN_AI
FENG CHEN
7 months
3/ We propose directly optimizing for coverage in the fine-tuning loss with Direct Coverage Optimization (DCO). DCO attenuates gradients on high-confidence samples, regularizing away from overconfidence. We demonstrate superior accuracy frontiers over CE loss in MATH and MiniF2F.
Tweet media one
1
1
2
@FCHEN_AI
FENG CHEN
7 months
2/ We focus on the pass@N test-time strategy and identify a misalignment between standard fine-tuning with cross-entropy loss and the pass@N coverage metric. Models trained with CE loss can be overconfident, and therefore suboptimal when tested for pass@N coverage.
Tweet media one
1
1
2
@FCHEN_AI
FENG CHEN
7 months
1/ Our new paper: “Rethinking Fine-Tuning when Scaling Test-Time Compute: Limiting Confidence Improves Mathematical Reasoning” on how to change training to better exploit test-time compute! co-led by @AllanRaventos, w/ Nan Cheng, @SuryaGanguli & @ShaulDr https://t.co/xM49OB6sk7
1
5
19
@MultiscaleAI
Multiscale AI
8 months
Join us at the ML for Multiscale Processes workshop at #ICLR2025 to hear from our three first amazing keynotes: Qianxiao Li https://t.co/ZQG6acjlwK Sergei Gukov https://t.co/2E5KjTpHxq Charlotte Bunne
0
2
1
@KuninDaniel
Daniel Kunin
9 months
Come check out our #NeurIPS2024 spotlight poster on feature learning tomorrow! 📍East Exhibit Hall A-C #2102 📅Thu 12 Dec 4:30 p.m. — 7:30 p.m. PST
@KuninDaniel
Daniel Kunin
1 year
🌟Announcing NeurIPS spotlight paper on the transition from lazy to rich🔦 We reveal through exact gradient flow dynamics how unbalanced initializations promote rapid feature learning co-led @AllanRaventos and @ClementineDomi6 @FCHEN_AI @klindt_david @SaxeLab @SuryaGanguli
0
7
49
@KuninDaniel
Daniel Kunin
2 years
Want to learn about SGD's implicit bias towards simpler subnetworks generated by permutation symmetry?! Come to our NeurIPS poster session tomorrow morning 10:45 - 12:45 Hall B1+B2 (level 1) #906
@KuninDaniel
Daniel Kunin
2 years
Our new preprint reveals how SGD biases neural nets towards vastly simpler subnets w/ superior generalization via stochastic collapse to invariant sets & explains why prolonged large learning rates help co-led w/ @FCHEN_AI @atsushi_y1230 & @SuryaGanguli https://t.co/tJKWp1Neng
1
7
75
@SuryaGanguli
Surya Ganguli
2 years
1/ Our new paper lead by @AllanRaventos @mansiege , @FCHEN_AI asks when in-context learning of regression can solve fundamentally *new* problems *not* seen during pre-training, and reveals it as an emergent capability arising from a phase transition...
Tweet card summary image
arxiv.org
Pretrained transformers exhibit the remarkable ability of in-context learning (ICL): they can learn tasks from just a few examples provided in the prompt without updating any weights. This raises...
4
39
169
@FCHEN_AI
FENG CHEN
2 years
Excited to share our new work on how pretraining task diversity affects in-context learning.
@mansiege
Mansheej Paul
2 years
Can in-context learning learn new tasks different from those in the pretraining data? Is this an emergent ability, i.e. does it arise from pretraining without being explicitly optimized for? How does this depend on pretraining task diversity? 🧵 1/ https://t.co/g118pWgAA9
0
0
4
@FCHEN_AI
FENG CHEN
2 years
Excited to share our new paper on how SGD biases towards simpler models via stochastic collapse to the invariant sets.
@KuninDaniel
Daniel Kunin
2 years
Our new preprint reveals how SGD biases neural nets towards vastly simpler subnets w/ superior generalization via stochastic collapse to invariant sets & explains why prolonged large learning rates help co-led w/ @FCHEN_AI @atsushi_y1230 & @SuryaGanguli https://t.co/tJKWp1Neng
0
2
4