Berfin Simsek Profile
Berfin Simsek

@bsimsek13

Followers
773
Following
991
Media
9
Statuses
178

New York, USA
Joined December 2017
Don't wanna be here? Send us removal request.
@bsimsek13
Berfin Simsek
7 months
Come see our analysis of a Gaussian multi-index model #AISTATS2025 on Sunday at Hall A—E 183. My favorite result is when the dot product between the ideal vectors exceeds a threshold, gradient flow fails to separate them under correlation loss! 😎
1
0
12
@bsimsek13
Berfin Simsek
5 months
Very exciting research direction ! 🙌🏼
@patrickshafto
Patrick Shafto
5 months
NY times article on expMath, my AI for math @darpa program, with commentary from mathematicians Andrew Granville, Bryna Kra, Jordan Ellenberg, and context from @the_IAS professor @alondra and @AnthropicAI CEO @DarioAmodei. https://t.co/y3CbJ8isM4
0
0
5
@bsimsek13
Berfin Simsek
7 months
cross-posted @bsimsek.bsky.social (I'm slowly migrating there)
0
0
0
@bsimsek13
Berfin Simsek
7 months
This was done in collaboration with Amire Bendjeddou & Daniel Hsu. 🙌 Paper link:
1
0
1
@bsimsek13
Berfin Simsek
7 months
Another result is a tight characterization of the time complexity of the gradient flow early in dynamics, a generalization of the known result for single-index to multi-index models. This generalization applies to arbitrary geometries as well. 🙃
1
0
0
@bsimsek13
Berfin Simsek
7 months
Below this threshold, a mild overparameterization (log k factor) for k index vectors is sufficient to match the neurons to the ideal vectors due to a coupon collector argument. 🤠
1
0
0
@bsimsek13
Berfin Simsek
7 months
When the ideal vectors form an equiangular frame, all learned weights converge to their average (no matter how much overparameterization is used), which has turned from a saddle to a local minimum after a certain threshold of the dot product. 😲
1
0
0
@bsimsek13
Berfin Simsek
7 months
Searching for an exact inverse map from learned weights to ideal (concept) vectors is an intricate geometry question, even for "simple" idealized models. 🧐
1
0
0
@bsimsek13
Berfin Simsek
8 months
I now have a new account on Bluesky https://t.co/rroIdJtlF8, follow me !
0
0
0
@bsimsek13
Berfin Simsek
8 months
Here is a link to my talk on distillation for neural networks https://t.co/DPdLT3HEth at Les Houches together with many other talks on algorithmic theories of learning 🙌 Thanks to organizers @_brloureiro and Vittorio
videos.univ-grenoble-alpes.fr
Découverte de l'université, des campus, de l'organisation et de la stratégie universitaire. 
1
0
3
@bsimsek13
Berfin Simsek
10 months
Nice talk by Jarod Alper at JMM’25 🙌🏼
0
0
4
@bsimsek13
Berfin Simsek
10 months
I don’t mind if o1 does not think clearly like humans. It’s great for computing formulas like integrals, even better in combination with Wolfram alpha 🙌🏼
@roydanroy
Dan Roy
10 months
o1 may be superhuman in some respects, but it's ability to think clearly mathematically about integration is still not equal to a strong high schooler.
1
0
4
@jacobandreas
Jacob Andreas
11 months
Ekin Akyürek (@akyurekekin) builds tools for understanding & controlling algorithms that underlie reasoning in language models. You’ve likely seen his work on in-context learning; I'm just as excited about past work on linguistic generalization & future work on test-time scaling.
3
7
45
@bsimsek13
Berfin Simsek
11 months
6/n My application package is available at https://t.co/N5QfbVdsyL, feel free to reach out! (n=6)
0
0
2
@bsimsek13
Berfin Simsek
11 months
5/n I'm enthusiastic about championing Gaussian multi-index models as mathematically analyzable and insightful models for MLPs. I continue developing new results for this model, it is fun!
1
0
1
@bsimsek13
Berfin Simsek
11 months
3/n It is unclear whether the LLMs in the wild can be robustly interpreted by ad-hoc methods. My approach is to analyze toy models that give insight into the non-linear feature compression of the MLPs. Fascinating math challenges are to be expected in this journey!
1
0
1
@bsimsek13
Berfin Simsek
11 months
2/n Studying the loss landscape is essential for understanding deep learning optimization and generalization. I developed a combinatorial complexity framework that quantifies the non-convexity of the deep learning loss landscapes.
1
0
4
@bsimsek13
Berfin Simsek
11 months
📢 I'm on the faculty job market this year! My research explores the foundations of deep learning and analyzes learning and feature geometry for Gaussian inputs. I detail my major contributions👇Retweet if you find it interesting and help me spread the word! DM is open. 1/n
1
23
78
@deepcohen
Jeremy Cohen
1 year
The Center for Computational Mathematics at Flatiron Institute is hiring research fellows (postdocs) to start next year -- applications are due December 15. This would be a great position for those working in e.g. the theory/science of deep learning. https://t.co/TdsHvFEs3k
1
12
72