
Behnam Neyshabur
@bneyshabur
Followers
26K
Following
925
Media
120
Statuses
805
Research @AnthropicAI 💼 Past: Gemini @GoogleDeepMind (Co-led Blueshift team) 🧠 LLM Reasoning / AI Scientist 🎒Traveling & Backpacking -- All views are my own!
Joined May 2014
@ethansdyer and I have started a new team at @AnthropicAI — and we’re hiring!. Our team is organized around the north star goal of building an AI scientist: a system capable of solving the long-term reasoning challenges and core capabilities needed to push the scientific.
6
19
413
Thrilled to share that I’m joining @AnthropicAI !. After 5.5 amazing years at Alphabet, including working on Gemini’s reasoning over the past 2 years, I’m looking forward to advancing Claude’s ability to tackle complex reasoning challenges across a diverse range of domains!.
61
23
1K
Some people say that one shouldn't care about publication and the quality matters. However, the job market punishes those who don’t have publications in top ML venues. I empathize with students and newcomers to ML whose good papers are not getting accepted. #ICLR2021 .1/.
17
181
1K
Excited to announce that the entire Blueshift team has joined @DeepMind! We will be working with @OriolVinyalsML and others to advance capabilities of LLMs developed by DM / Alphabet! We hope to continue to grow DM's presence in Bay Area and New York in the coming months :-)
32
53
1K
These days, many people are interested in getting a PhD in ML. I think you should think really hard before committing to a PhD program in ML. Why?. I'm going to summarize some thoughts in this thread:. 1/10.
The main author of DALL-E at OpenAI, Aditya Ramesh, has no graduate degree. He has a bachelor from NYU. He worked on a couple of research projects in my lab in his last years. He wanted to do a PhD after graduating. But he did a summer internship at OpenAI, and they kept him.
24
144
1K
Totally agree!. Anyone screening applications and any applicant thinking their CV is not representative of their skills/potentials, I think you might want to read the story of my own PhD application in this thread:. 1/.
Any document claiming an easy way to gauge grad school applicants needs to be challenged. To wit: While 2 or 3 Unis in Iran are far more selective than others, the # of outstanding candidates far exceeds their enrollment. The given ranking is thus opinion not fact & is misleading
29
164
1K
Very excited to announce a significant milestone in expanding reasoning capabilities of language models! 🎉🎉. We introduce #Minerva🦉: a language model that can solve mathematical questions using step-by-step natural language reasoning: . 🧵. 1/
Very excited to present Minerva🦉: a language model capable of solving mathematical questions using step-by-step natural language reasoning. Combining scale, data and others dramatically improves performance on the STEM benchmarks MATH and MMLU-STEM.
11
123
612
You think the RNN era is over? Think again!. We introduce "Block-Recurrent Transformer", which applies a transformer layer in a recurrent fashion & beats transformer XL on LM tasks. Paper: W. DeLesley Hutchins, Imanol Schlag, @Yuhu_ai_ & @ethansdyer. 1/
5
66
439
It turns out that it is possible to get this right with minimal change on the original prompt!
🙄 @GoogleAI, “a deep level understanding”? . Seriously?!. Your system can’t distinguish “a horse riding an astronaut” from “an astronaut riding a horse”. 🙄
12
43
401
Excited to announce an internship opportunity for summer or fall 2021🔥The research will explore qualitatively new behaviors of massive (100B+ params🦾) transformers!.If you are interested & have related experience please reach me at neyshabur@google.com. Please retweet & share!.
5
83
393
I got a one-way ticket and wasn’t sure if things would work out but working from Alaska turned out to be a great idea!.#WorkFromAlaska #WorkFromAnywhere
13
5
359
My wife and I love traveling & backpacking. A year ago today, we broke our lease & moved our belongings to a storage to try #workation lifestyle. We decided to give it a shot for a few weeks & come back if doesn't work out. Amazingly, we are not back yet and we love it this way!
6
1
337
Very excited to join Google today as a research scientist! I am forever grateful for the opportunity to learn from my postdoc advisors @ylecun, @prfsanjeevarora and my PhD advisor Nati Srebro.
21
5
326
🆕 📰: Deep Learning Through the Lens of Example Difficulty. We introduce a measure of computational difficulty and show its surprising relationships with different deep learning phenomena. Paper: with @Robert_Baldock & Hartmut Maennel. 1/
5
60
289
Very excited to share what we have been working on in the last several months: Gemini 1.0!. Google Blogpost: DeepMind Blogpost:. Technical Report:.
Introducing Gemini 1.0, our most capable and general AI model yet. Built natively to be multimodal, it’s the first step in our Gemini-era of models. Gemini is optimized in three sizes - Ultra, Pro, and Nano. Gemini Ultra’s performance exceeds current state-of-the-art results on
5
29
284
2.5 years ago, our team decided to improve reasoning capabilities of LLMs & Hendryks MATH has been a valuable benchmark for tracking progress. It's mind blowing to see the progress since then, from our Minerva paper all the way to this recent update. MATH is now the new MNIST!
I'm excited about this! Our team has been working really hard to improve Gemini 1.5 capabilities significantly on multiple fronts and in particular MATH/STEM! Please see the report here: .
12
33
274
I'm only asking people to think hard before committing to an ML PhD program. But an ML PhD could still end up working great for many! Also, I covered particular less discussed cons and did not intend to provide the full picture. 🧵My own PhD was truly a roller coaster 🎢:. 1/n.
These days, many people are interested in getting a PhD in ML. I think you should think really hard before committing to a PhD program in ML. Why?. I'm going to summarize some thoughts in this thread:. 1/10.
4
18
233
ML twitter is amazing! At this point, arxiv papers and even blogposts are too slow for ML and "twitter papers" can go a long way (particularly when the code is released like this one)😅.
Along with many others, I find the results of Git Re-Basin by @SamuelAinsworth, J. Hayase & .@siddhss5 highly interesting. But I believe there is a crucial detail which deserves attention: The authors replace BatchNorm with LayerNorm in their ResNet and VGG implementations. 1/14.
2
10
222
Same here. During my PhD (6 years), I wasn't able to go outside of US to attend conferences or visit my family because of my single entry visa. #MultipleEntryVisa should be granted to all international students!.
While we are at it, can we grant international students #MultipleEntryVisa for the duration of their studies (instead of single-entry)? It may sound like a minor issue for many but it is actually a big deal for many international students. I explain it below. 👇.
0
26
211
Pheeew! Thanks for practicing social distancing! @LakeClarkNPS #alaska #bear #SocialDistancing #COVID19
12
5
199
If you are a CS graduate applicant this or next year that might be negatively affected by such "quick guide"s, feel free to book a time with me through @ml_collective office hours:.Also, you might want to read the story of my own PhD application: 👇.
Application review season is coming up! If you are a CS faculty trying to review Iranian applicants, here is a quick guide on how to gauge them: #AcademicChatter.
5
26
177
Interested in Reasoning with Large Language Models?. We are hiring!. Internship:.Full-Time Research Scientist:.Full-Time Research Engineer:. Learn more about Blueshift Team:
Interested in Large Language Models?. Stop by our 4 posters at #NeurIPS2022 on Tuesday. 👇.
6
18
166
I'm excited about this! Our team has been working really hard to improve Gemini 1.5 capabilities significantly on multiple fronts and in particular MATH/STEM! Please see the report here: .
Today we have published our updated Gemini 1.5 Model Technical Report. As @JeffDean highlights, we have made significant progress in Gemini 1.5 Pro across all key benchmarks; TL;DR: 1.5 Pro > 1.0 Ultra, 1.5 Flash (our fastest model) ~= 1.0 Ultra. As a math undergrad, our drastic
9
17
166
Fast forward to 2019, with a PhD from @TTIC_Connect, several well-known universities who wouldn't even seriously consider me for their PhD program gave me offers to become a tenure-track faculty, which I declined due to reasons that are outside of the scope of this tweet. 12/.
1
1
158
Come to our talks and posters at #ICLR2021 to discuss our findings on understanding and improving deep learning! Talks and posters are available now! Links to the talks, posters, papers and codes in the thread:. 1/7
1
23
152
6.9%-->91.1% on MATH. AI is definitely hitting a wall😏.
One other thing in the updated Gemini 1.5 Pro report: we show how a research model that is a mathematics-specialized version of 1.5 Pro achieves a record score of 91.1% on the MATH benchmark (the SOTA just 3 years ago, in May, 2021 was 6.9%!).
9
5
153
Silver medal in International Math Olympiad! And we were so close to getting a gold medal! Congrats to the AlphaProof, AlphaGeometry and the informal proof teams. At this point, it’s very hard to predict where we get in years ahead of us!.
We’re presenting the first AI to solve International Mathematical Olympiad problems at a silver medalist level.🥈. It combines AlphaProof, a new breakthrough model for formal reasoning, and AlphaGeometry 2, an improved version of our previous system. 🧵
3
11
148
I have been hosting a weekly ML Collective Office Hour for the last 4 months and it has been a very positive experience. It is open to EVERYONE and it can be about ANYTHING (research, career, etc.).
Do you know ML Collective has an "Office Hours" service? You can book 1:1 chats with researchers who kindly open up their calendars to serve the community. Current ongoing sessions are proudly hosted by @bneyshabur @AndreaMadotto @natolambert
1
9
134
Acceptance of this paper to #ICLR2023 is particularly rewarding to me because it is a very successful examples of what I was envisioning when I created collaboration request form that is open to everyone as part of @ml_collective : . 1/3.
Why don’t current model merging results generalize to standard ConvNets? And how can this be fixed?. We answer these Qs and present a method that improves merged NN performance for any choice of norm layer. W/ @HanieSedghi @osaukh @rahiment @bneyshabur
2
16
123
All US universities rejected my Application without even interviewing me, except @TTIC_Connect where a Prof. decided to interview me. @TTIC_Connect was less known back then and was recommended to me by my dear friend @ArashVahdat. 3/.
2
1
109
Looking back on papers published on generalization of deep networks, a paper published by @KDziugaite and @roydanroy about two years ago wins my "imaginary" test of time award:.There is a lot of novelty in this work!.
4
15
107
Sending ❤️ to @ilyasut and all amazing OpenAI colleagues. You didn't deserve to go through this. We are all part of the same small community and no matter what happens, we have each other's back. I'm sure OpenAI team continues to build amazing things wherever they are 💯.
I deeply regret my participation in the board's actions. I never intended to harm OpenAI. I love everything we've built together and I will do everything I can to reunite the company.
2
6
106
I'm personally super excited about #ICLR2023. I have already booked my flight tickets, planned a guided trek to Nyungwe and Volcanoes National Parks in Rwanda, followed by #workation and safaris in South Africa and Kenya! Will share more details soon for those who are interested!.
Let's goooooo! ML conferences are experiencing an identity crisis — they are all the same. Same people, same papers, same talks. What's distinct about this year's ICLR is that the special location will present the most different demographic from all other ML confs. 🧶 1/5.
3
4
102
We have a bold conjecture! We think "there is some truth to it" but "it is not true as stated". We state it as is, show our extensive experiments fall short of refuting it & we hope that people who find it exciting try to refute it and replace it with a better one :-) #science.
🆕The Role of Permutation Invariance in Linear Mode Connectivity of Neural Networks. Our conjecture: Taking permutations into account, there is likely no barrier in the linear interpolation between SGD solutions. w @HanieSedghi @osaukh @bneyshabur.1/10
1
12
96
A short thread including a few points regarding relationship between SAM, sharpness and generalization:. 1/.
@jeankaddour @jeremyphoward @CsabaSzepesvari Here’s a SAM author on flatness. SAM has the word “sharpness” in the title but beyond that, its connection to sharpness is poorly understood.
2
15
96
1st day of our 9-day backpacking trip. Highlights: a tricky creek-crossing, hail in a sunny day, seeing a caribou,. #alaska #backpacking #wrangell #glacier #notrail
2
0
94
After several years of reviewing & AC work for @NeurIPSConf, @iclr_conf & @icmlconf, I have strong opinions about the reviewing system and some suggestions that many may not like or agree with. Summarizing my points in this thread (hastily written & NOT carefully considered):. 1/.
3
8
92
Climbing #Iztaccihautl (white woman) was such an incredible #mountaineering experience. A 5,230m (17,160ft) dormant #volcano located in #mexico close to #mexicocity and north of its twin, #popocatepetl, which is an active volcano (you can see an erruption in one of the pictures).
2
0
89
Pictures I took during the flight that dropped us off in the backcountry to start our 9-day backpacking trip in Alaska. #alaska #backpacking #wrangell #glacier #notrail
2
0
84
Yes! Sergey Brin has been a core technical contributor showing up to work in office together with other Gemini team members! Other than his technical contributions, I have been amazed at how his presence has energized everyone💙.
@0interestrates @nearcyan Name order was randomised (except for the first 6 names which spell out Gemini) - Sergey was in with us basically every day, often pairing!.
0
3
78
Just finished 12 back-to-back 20-min meetings with 12 amazing people who signed up for this and I'm not even a bit tired. It was a very encouraging experience!.
If you believe you are at a disadvantage in ML community (e.g. because of your race, gender, nationality, background or other circumstances) and need guidance & help, I'd love to meet you! Just pick a time from here:. Please RETWEET to spread the word!.
0
0
79
At this point, I'm convinced that this cannot be explained by a combination of luck and quality of the papers. My belief is that the current system has lots of unnecessary and sometimes harmful biases which is #unfair to new comers and anyone who is outside of the "norm". 3/.
1
3
76
I'm very excited about this release: Gemini 1.5 Pro - A highly capable multimodal model with a 10M(!!!) token context length!.
Gemini 1.5 Pro - A highly capable multimodal model with a 10M token context length. Today we are releasing the first demonstrations of the capabilities of the Gemini 1.5 series, with the Gemini 1.5 Pro model. One of the key differentiators of this model is its incredibly long
0
3
76
This plane is going to drop us off in Wrangell-St. Elias’s backcountry for a 9-day backpacking where there is no trail or cell coverage. You can follow our location via this link which gets updated by our sat. communicator:.#alaska #wrangell #backpacking
2
0
69
Think about all issues women deal with in our field. Add to this all restrictions Iranians face in the US! Want to support Iranian Women in our field? IranWiC has a great mentorship program and as one of its board members, I assure you every $1 donation translates to high impact!.
We have started a giving campaign this October to support the mentorship efforts offered by IranWiC team to make a difference in the career path of Iranian women in computing and help them achieve their goals. To contribute, please visit:
0
11
70
See our @googleai blog post on a new framework to study generalization based on an empirically verified conjecture that connects generalization to online optimization. This is a joint work with @PreetumNakkiran and @HanieSedghi.
A core challenge in #DeepLearning is the disconnect between the theory of how models generalize and how they perform in practice. A new theoretical framework demonstrates how to understand model generalization through optimization behavior. Check it out at
1
2
68