Berfin Simsek
@bsimsek13
Followers
773
Following
991
Media
9
Statuses
178
New York, USA
Joined December 2017
Come see our analysis of a Gaussian multi-index model #AISTATS2025 on Sunday at Hall A—E 183. My favorite result is when the dot product between the ideal vectors exceeds a threshold, gradient flow fails to separate them under correlation loss! 😎
1
0
12
Very exciting research direction ! 🙌🏼
NY times article on expMath, my AI for math @darpa program, with commentary from mathematicians Andrew Granville, Bryna Kra, Jordan Ellenberg, and context from @the_IAS professor @alondra and @AnthropicAI CEO @DarioAmodei. https://t.co/y3CbJ8isM4
0
0
5
cross-posted @bsimsek.bsky.social (I'm slowly migrating there)
0
0
0
This was done in collaboration with Amire Bendjeddou & Daniel Hsu. 🙌 Paper link:
1
0
1
Another result is a tight characterization of the time complexity of the gradient flow early in dynamics, a generalization of the known result for single-index to multi-index models. This generalization applies to arbitrary geometries as well. 🙃
1
0
0
Below this threshold, a mild overparameterization (log k factor) for k index vectors is sufficient to match the neurons to the ideal vectors due to a coupon collector argument. 🤠
1
0
0
When the ideal vectors form an equiangular frame, all learned weights converge to their average (no matter how much overparameterization is used), which has turned from a saddle to a local minimum after a certain threshold of the dot product. 😲
1
0
0
Searching for an exact inverse map from learned weights to ideal (concept) vectors is an intricate geometry question, even for "simple" idealized models. 🧐
1
0
0
I now have a new account on Bluesky https://t.co/rroIdJtlF8, follow me !
0
0
0
Here is a link to my talk on distillation for neural networks https://t.co/DPdLT3HEth at Les Houches together with many other talks on algorithmic theories of learning 🙌 Thanks to organizers @_brloureiro and Vittorio
videos.univ-grenoble-alpes.fr
Découverte de l'université, des campus, de l'organisation et de la stratégie universitaire.
1
0
3
I don’t mind if o1 does not think clearly like humans. It’s great for computing formulas like integrals, even better in combination with Wolfram alpha 🙌🏼
o1 may be superhuman in some respects, but it's ability to think clearly mathematically about integration is still not equal to a strong high schooler.
1
0
4
Ekin Akyürek (@akyurekekin) builds tools for understanding & controlling algorithms that underlie reasoning in language models. You’ve likely seen his work on in-context learning; I'm just as excited about past work on linguistic generalization & future work on test-time scaling.
3
7
45
6/n My application package is available at https://t.co/N5QfbVdsyL, feel free to reach out! (n=6)
0
0
2
5/n I'm enthusiastic about championing Gaussian multi-index models as mathematically analyzable and insightful models for MLPs. I continue developing new results for this model, it is fun!
1
0
1
4/n Here is a blog post: https://t.co/XfBW2Vc07e You may enjoy this exposition if you like Toy Models of Superposition.
bsimsek.com
It is important to understand how large models represent knowledge to make them efficient and safe. We study a toy model of neural nets that exhibits non-linear dynamics and phase transition....
1
1
1
3/n It is unclear whether the LLMs in the wild can be robustly interpreted by ad-hoc methods. My approach is to analyze toy models that give insight into the non-linear feature compression of the MLPs. Fascinating math challenges are to be expected in this journey!
1
0
1
2/n Studying the loss landscape is essential for understanding deep learning optimization and generalization. I developed a combinatorial complexity framework that quantifies the non-convexity of the deep learning loss landscapes.
1
0
4
📢 I'm on the faculty job market this year! My research explores the foundations of deep learning and analyzes learning and feature geometry for Gaussian inputs. I detail my major contributions👇Retweet if you find it interesting and help me spread the word! DM is open. 1/n
1
23
78
The Center for Computational Mathematics at Flatiron Institute is hiring research fellows (postdocs) to start next year -- applications are due December 15. This would be a great position for those working in e.g. the theory/science of deep learning. https://t.co/TdsHvFEs3k
1
12
72