FENG CHEN @FCHEN_AI X Profile

FENG CHEN

@FCHEN_AI

Followers

70

Following

15

Media

3

Statuses

13

PhD @Stanford, computational neuroscience and deep learning.

Joined July 2022

Don't wanna be here? Send us removal request.

NEXA AI

@nexa_ai

22 days

AI has always been GPU-first. But on-device AI should be NPU-first. Today we’re launching OmniNeural-4B — the world’s first NPU-aware multimodal model, natively understanding text, images, and audio. And introducing nexaML — a generative AI inference engine that runs models on

11

34

107

Surya Ganguli

@SuryaGanguli

3 months

The best part of this job is seeing students graduate and launch their careers! Congrats to Feng Chen, Atsushi Yamamura, Tamra Nebabu, Linnie Wharton and Daniel Kunin. They are all going on to top positions across artificial intelligence, medicine, and physics. Proud of you!

1

91

FENG CHEN

@FCHEN_AI

7 months

Proud to be part of the team behind this new open-source SOTA formal math prover! 🚀 Achieving 72.95% on MiniF2F with simple BFS strategy. Our models are trained using expert iteration and DPO, pushing the boundaries of formal theorem proving. 📄 Paper:

arxiv.org

Recent advancements in large language models (LLMs) have spurred growing interest in automatic theorem proving using Lean4, where effective tree search methods are crucial for navigating the...

Ran Xin

@RanXinByteDance

7 months

🚀 Excited to announce BFS-Prover, our state-of-the-art theorem proving system in Lean4! We've achieved 72.95% on the MiniF2F test, surpassing all previous systems including DeepSeek-Prover-v1.5, InternLM2.5-StepProver, and HunyuanProver 📈 🔥 Key innovations: - Simple

0

2

FENG CHEN

@FCHEN_AI

7 months

4/ We extend our algorithm to automated theorem proving and math QA with CoT. In theorem proving, our approach improves performance by controlling the exploitation and exploration tradeoff in proof trees. In CoT, where overconfidence is less severe, we also see performance gains.

0

1

3

FENG CHEN

@FCHEN_AI

7 months

3/ We propose directly optimizing for coverage in the fine-tuning loss with Direct Coverage Optimization (DCO). DCO attenuates gradients on high-confidence samples, regularizing away from overconfidence. We demonstrate superior accuracy frontiers over CE loss in MATH and MiniF2F.

1

2

FENG CHEN

@FCHEN_AI

7 months

2/ We focus on the pass@N test-time strategy and identify a misalignment between standard fine-tuning with cross-entropy loss and the pass@N coverage metric. Models trained with CE loss can be overconfident, and therefore suboptimal when tested for pass@N coverage.

1

2

FENG CHEN

@FCHEN_AI

7 months

1/ Our new paper: “Rethinking Fine-Tuning when Scaling Test-Time Compute: Limiting Confidence Improves Mathematical Reasoning” on how to change training to better exploit test-time compute! co-led by @AllanRaventos, w/ Nan Cheng, @SuryaGanguli & @ShaulDr https://t.co/xM49OB6sk7

1

5

19

Multiscale AI

@MultiscaleAI

8 months

Join us at the ML for Multiscale Processes workshop at #ICLR2025 to hear from our three first amazing keynotes: Qianxiao Li https://t.co/ZQG6acjlwK Sergei Gukov https://t.co/2E5KjTpHxq Charlotte Bunne

0

2

1

Daniel Kunin

@KuninDaniel

9 months

Come check out our #NeurIPS2024 spotlight poster on feature learning tomorrow! 📍East Exhibit Hall A-C #2102 📅Thu 12 Dec 4:30 p.m. — 7:30 p.m. PST

Daniel Kunin

@KuninDaniel

1 year

🌟Announcing NeurIPS spotlight paper on the transition from lazy to rich🔦 We reveal through exact gradient flow dynamics how unbalanced initializations promote rapid feature learning co-led @AllanRaventos and @ClementineDomi6 @FCHEN_AI @klindt_david @SaxeLab @SuryaGanguli

0

7

49

Daniel Kunin

@KuninDaniel

2 years

Want to learn about SGD's implicit bias towards simpler subnetworks generated by permutation symmetry?! Come to our NeurIPS poster session tomorrow morning 10:45 - 12:45 Hall B1+B2 (level 1) #906

Daniel Kunin

@KuninDaniel

2 years

Our new preprint reveals how SGD biases neural nets towards vastly simpler subnets w/ superior generalization via stochastic collapse to invariant sets & explains why prolonged large learning rates help co-led w/ @FCHEN_AI @atsushi_y1230 & @SuryaGanguli https://t.co/tJKWp1Neng

1

7

75

Surya Ganguli

@SuryaGanguli

2 years

1/ Our new paper lead by @AllanRaventos @mansiege , @FCHEN_AI asks when in-context learning of regression can solve fundamentally *new* problems *not* seen during pre-training, and reveals it as an emergent capability arising from a phase transition...

arxiv.org

Pretrained transformers exhibit the remarkable ability of in-context learning (ICL): they can learn tasks from just a few examples provided in the prompt without updating any weights. This raises...

4

39

169

FENG CHEN

@FCHEN_AI

2 years

Excited to share our new work on how pretraining task diversity affects in-context learning.

Mansheej Paul

@mansiege

2 years

Can in-context learning learn new tasks different from those in the pretraining data? Is this an emergent ability, i.e. does it arise from pretraining without being explicitly optimized for? How does this depend on pretraining task diversity? 🧵 1/ https://t.co/g118pWgAA9

0

4

FENG CHEN

@FCHEN_AI

2 years

Excited to share our new paper on how SGD biases towards simpler models via stochastic collapse to the invariant sets.

Daniel Kunin

@KuninDaniel

2 years

Our new preprint reveals how SGD biases neural nets towards vastly simpler subnets w/ superior generalization via stochastic collapse to invariant sets & explains why prolonged large learning rates help co-led w/ @FCHEN_AI @atsushi_y1230 & @SuryaGanguli https://t.co/tJKWp1Neng

0

2

4