Berfin Simsek @bsimsek13 X Profile

Berfin Simsek

@bsimsek13

Followers

769

Following

984

Media

9

Statuses

178

Research fellow @FlatironCCM & @NYU, previously: Ph.D. @EPFL, intern @MetaAI. DL Theory 😎, Math 🥰, AI 🤗 Slowly migrating to @bsimsek.bsky.social

New York, USA

Joined December 2017

Don't wanna be here? Send us removal request.

Berfin Simsek

@bsimsek13

3 months

Come see our analysis of a Gaussian multi-index model #AISTATS2025 on Sunday at Hall A—E 183. My favorite result is when the dot product between the ideal vectors exceeds a threshold, gradient flow fails to separate them under correlation loss! 😎.

1

0

12

Berfin Simsek

@bsimsek13

1 month

Very exciting research direction ! 🙌🏼.

Patrick Shafto

@patrickshafto

1 month

NY times article on expMath, my AI for math @darpa program, with commentary from mathematicians Andrew Granville, Bryna Kra, Jordan Ellenberg, and context from @the_IAS professor @alondra and @AnthropicAI CEO @DarioAmodei.

0

5

Berfin Simsek

@bsimsek13

3 months

cross-posted @bsimsek.bsky.social .(I'm slowly migrating there).

0

Berfin Simsek

@bsimsek13

3 months

This was done in collaboration with Amire Bendjeddou & Daniel Hsu. 🙌 Paper link:

arxiv.org

This work focuses on the gradient flow dynamics of a neural network model that uses correlation loss to approximate a multi-index function on high-dimensional standard Gaussian data. Specifically,...

1

0

1

Berfin Simsek

@bsimsek13

3 months

Another result is a tight characterization of the time complexity of the gradient flow early in dynamics, a generalization of the known result for single-index to multi-index models. This generalization applies to arbitrary geometries as well. 🙃.

1

0

Berfin Simsek

@bsimsek13

3 months

Below this threshold, a mild overparameterization (log k factor) for k index vectors is sufficient to match the neurons to the ideal vectors due to a coupon collector argument. 🤠.

1

0

Berfin Simsek

@bsimsek13

3 months

When the ideal vectors form an equiangular frame, all learned weights converge to their average (no matter how much overparameterization is used), which has turned from a saddle to a local minimum after a certain threshold of the dot product. 😲.

1

0

Berfin Simsek

@bsimsek13

3 months

Searching for an exact inverse map from learned weights to ideal (concept) vectors is an intricate geometry question, even for "simple" idealized models. 🧐.

1

0

Berfin Simsek

@bsimsek13

5 months

I now have a new account on Bluesky..follow me !.

0

Berfin Simsek

@bsimsek13

5 months

Here is a link to my talk on distillation for neural networks. at Les Houches together with many other talks on algorithmic theories of learning 🙌 Thanks to organizers @_brloureiro and Vittorio.

videos.univ-grenoble-alpes.fr

Découverte de l'université, des campus, de l'organisation et de la stratégie universitaire.

1

0

3

Berfin Simsek

@bsimsek13

7 months

Nice talk by Jarod Alper at JMM’25 🙌🏼

0

4

Berfin Simsek

@bsimsek13

7 months

I don’t mind if o1 does not think clearly like humans. It’s great for computing formulas like integrals, even better in combination with Wolfram alpha 🙌🏼.

Dan Roy

@roydanroy

7 months

o1 may be superhuman in some respects, but it's ability to think clearly mathematically about integration is still not equal to a strong high schooler.

1

0

4

Berfin Simsek

@bsimsek13

7 months

RT @jacobandreas: Ekin Akyürek (@akyurekekin) builds tools for understanding & controlling algorithms that underlie reasoning in language m….

0

7

0

Berfin Simsek

@bsimsek13

8 months

6/n My application package is available at feel free to reach out! (n=6)

0

2

Berfin Simsek

@bsimsek13

8 months

5/n I'm enthusiastic about championing Gaussian multi-index models as mathematically analyzable and insightful models for MLPs. I continue developing new results for this model, it is fun!.

1

0

1

Berfin Simsek

@bsimsek13

8 months

4/n Here is a blog post: You may enjoy this exposition if you like Toy Models of Superposition.

bsimsek.com

It is important to understand how large models represent knowledge to make them efficient and safe. We study a toy model of neural nets that exhibits non-linear dynamics and phase transition....

1

Berfin Simsek

@bsimsek13

8 months

3/n It is unclear whether the LLMs in the wild can be robustly interpreted by ad-hoc methods. My approach is to analyze toy models that give insight into the non-linear feature compression of the MLPs. Fascinating math challenges are to be expected in this journey!.

1

0

1

Berfin Simsek

@bsimsek13

8 months

2/n Studying the loss landscape is essential for understanding deep learning optimization and generalization. I developed a combinatorial complexity framework that quantifies the non-convexity of the deep learning loss landscapes.

1

0

4

Berfin Simsek

@bsimsek13

8 months

📢 I'm on the faculty job market this year! . My research explores the foundations of deep learning and analyzes learning and feature geometry for Gaussian inputs. I detail my major contributions👇Retweet if you find it interesting and help me spread the word! DM is open. 1/n.

1

22

75

Berfin Simsek

@bsimsek13

8 months

RT @deepcohen: The Center for Computational Mathematics at Flatiron Institute is hiring research fellows (postdocs) to start next year -- a….

0

12

0