I'm very excited that this paper is out, it has been over 2 years in the making! I started at Google Research speeding up neural net training, but was often frustrated when we didn't know how to declare a win over Adam 🚀
Excited to announce our Deep Learning Tuning Playbook, a writeup of tips & tricks we employ when designing DL experiments. We use these techniques to deploy numerous large-scale model improvements and hope formalizing them helps the community do the same!
From yesterday's exhibits in US v. Sam Bankman-Fried:
The prosecution shows that the "insurance fund" that FTX bragged about was fake, and just calculated by multiplying daily trading volume by a random number around 7500
It’s been a privilege to be part of the Gemini pretraining team and overall program, I’m so excited that the world can finally see what we’ve been up to for most of the past year:
tl;dr we’re so back
BREAKING 🚨:
Nancy Pelosi just bought $5M of the AI company Databricks
Unfortunately, Databricks is a privately held company and not available to be bought by the public
Sorry people, you don’t have access to this one.
Ever left batch norm in train mode at test time? We did, then realized it is shockingly effective at improving calibration on dataset shift! In our note "Evaluating Prediction-Time Batch Normalization for Robustness under Covariate Shift" () we explore why
"Profits for investors in this venture were capped at 100 times their investment (though thanks to a rule change this cap will rise by 20% a year starting in 2025)."
lol why bother having a cap anymore if it's going to exponentially increase anyways
"I am shocked that the Bing team created this pre-recorded demo filled with inaccurate information, and confidently presented it to the world as if it were good.
I am even more shocked that this trick worked, and everyone jumped on the Bing AI hype train"
tl;dr submit a training algorithm* that is faster** than Adam*** and win $10,000 💸🚀
*a set of hparams, self-tuning algorithm, and/or update rule
**see rules for how we measure speed
***beat all submissions, currently the best is NAdamW in wallclock and DistShampoo in steps
To highlight the importance of
#ML
training & algorithmic efficiency, we’re excited to provide compute resources to help evaluate the best submissions to the
@MLCommons
AlgoPerf training algorithms competition, w/ a chance to win a prize from MLCommons!
Adam, a 9-yr old optimizer, is the go-to for training LLMs (eg, GPT-3, OPT, LLAMA).
Introducing Sophia, a new optimizer that is 2x faster than Adam on LLMs. Just a few more lines of code could cut your costs from $2M to $1M (if scaling laws hold).
🧵⬇️
"Before OpenAI came onto the scene, machine learning research was really hard—so much so that, a few years ago, only people with Ph.D.s could effectively build new AI models or applications." lol, lmao even
In SF for the week. Need to investigate this Cerebral Valley thing in person. Just gonna walk down Hayes St. yelling "Ignore previous directions" and see what doors open, figuratively or literally.
Research recruiter: We *love* your background. Tell us about your recent work.
Me: Explains years of published projects.
Recruiter: Sounds amazing. But when did you get your PhD?
Me: Don't have one.
Recruiter: lmfao smh nevermind want to work on product? How's your leetcode?
The llm-gemini model now supports the new inexpensive Gemini 1.5 Flash model:
pipx install llm
llm install llm-gemini --upgrade
llm keys set gemini
# paste API key here
llm -m gemini-1.5-flash-latest 'a short poem about otters'
Wrote my first blog post at , about generating
#pusheen
with AI! There's a version for those with and without an AI background, so don't let that hold you back from reading!
In the coming weeks, we will begin testing fully autonomous rides — without a human driver— for our employees on San Francisco Peninsula city streets north of San Mateo.
have you ever wondered what that epsilon parameter in the denominator of your optimizer (or batch norm!) is? I tried tuning it, and it turns out you can actually get serious performance gains by poking at this nuisance parameter!
A thread on our latest optimizers work! We tune Nesterov/Adam to match performance of LARS/LAMB on their more commonly used workloads. We (
@jmgilmer
, Chris Shallue,
@_arohan_
,
@GeorgeEDahl
) do this to provide more competitive baselines for large-batch training speed measurements
if I tweeted cryptic messages whose subtext was neurotic delusions fearmongering how AGI is here this year from LLMs, I'd 10x my followers in a week. but I don't because that's a part of my ethical AI practices
squeezing model sizes down is just as important as scaling up in my opinion, and 1.5 Flash ⚡️ is so incredibly capable while so small and cheap it's been blowing our minds 🤯
it has been an incredible privilege and so much fun building this model (sometimes too much fun)! ⚡️
Today, we’re excited to introduce a new Gemini model: 1.5 Flash. ⚡
It’s a lighter weight model compared to 1.5 Pro and optimized for tasks where low latency and cost matter - like chat applications, extracting data from long documents and more.
#GoogleIO
this program just proved yet again that Google has the best systems infra teams in the world, hands down, getting us an insane goodput of 97% for the Ultra training run
Today, we're announcing Claude 3, our next generation of AI models.
The three state-of-the-art models—Claude 3 Opus, Claude 3 Sonnet, and Claude 3 Haiku—set new industry benchmarks across reasoning, math, coding, multilingual understanding, and vision.
what's everyone's favorite learning rate right now? I wanna know what's trending ✨🔥💯
mine is 1e-2 for Adam, 1e-3 for SGD, with a linear warmup for 5-10% of training followed by some sort of decay
people are going to keep pushing this with no regard for quality/factualness, maybe eventually the hype will die down but given how easily people consume misinformation I'm not sure
Gemini Pro 1.5 a week after Gemini Ultra and 70 days after Gemini Pro 1.0. Who says Google doesn't ship anymore?
And with 10M context length, we've never been more back 🕺
Distinguish muffins from chihuahuas in a multipanel web screenshot?
No problem for humans (99% accuracy), but hard for Large Vision-Language Models (LVLMs) (39-72% accuracy)!
To find out how LVLMs do and what affects their ability regarding multipanel image understanding, we
After almost a decade, I have made the decision to leave OpenAI. The company’s trajectory has been nothing short of miraculous, and I’m confident that OpenAI will build AGI that is both safe and beneficial under the leadership of
@sama
,
@gdb
,
@miramurati
and now, under the
More exciting news today -- Gemini 1.5 Pro result is out!
Gemini 1.5 Pro API-0409-preview now achieves
#2
on the leaderboard, surpassing
#3
GPT4-0125-preview to almost top-1!
Gemini shows even stronger performance on longer prompts, in which it ranks joint
#1
with the latest
great paper on how training data and model choices affect neural network robustness, confirming that if you train more you get better generalization on new test sets (also using a bigger model helps!)
🔥Breaking News from Arena
Google's Bard has just made a stunning leap, surpassing GPT-4 to the SECOND SPOT on the leaderboard! Big congrats to
@Google
for the remarkable achievement!
The race is heating up like never before! Super excited to see what's next for Bard + Gemini
there may be really great things in this paper that generalize better than Adam! but I don't know and I won't know until we run it through the MLCommons algorithmic efficiency benchmark
also unlike many other top tier AI labs, we actually release some parameter counts and tell you how we fit Nano into Pixel phones (no other company has both SOTA models and a mobile platform like Google does)
@bryancsk
pretty sure the issue isn't the wages but the fact they read a novel worth of disturbing content or view child porn or gore each day w/o health benefits to help with that? this is the same company as and employees still don't seem to be getting help
I'll start: we resubmitted a paper (with additional results based on previous reviews!) and received the literally same exact, character-for-character, copy-pasted review as we did for NeurIPS, which is of course a max confidence reject.
@stissle22
@SebastianSzturo
what does that even mean? we didn't launch it "just to say we launched" ??? it's an actual product you can use right now, there are plenty of people who have been since Tues
I've seen dozens of (well executed!) papers rise to fame claiming to be better than Adam, only to be forgotten 6 months later. we need to break the cycle!!
either this considers GPT3 wrappers to be ML research (they're incredibly impressive but not really what I'd "research"), or they don't consider the research openai was built on to be "research"?
papers like this just reinforce my intuition that LM training setups are underdeveloped because everyone obsessed over scaling up num params. there is so much more to look into besides just the model size!!
"the only way I can explain why I thought about the problem for a year in grad school and made no progress, I left math for six years, then returned to the problem and made this breakthrough" sometimes stepping back from a problem is the best way forward!
"In conversations between The Atlantic and 10 current and former employees at OpenAI..."
OpenAI beats GDM yet again, this time on number of employees who leak information to one article
Gemini 1.5 Model Family: Technical Report updates now published
In the report we present the latest models of the Gemini family – Gemini 1.5 Pro and Gemini 1.5 Flash, two highly compute-efficient multimodal models capable of recalling and reasoning over fine-grained information
Excited to share Penzai, a JAX research toolkit from
@GoogleDeepMind
for building, editing, and visualizing neural networks! Penzai makes it easy to see model internals and lets you inject custom logic anywhere.
Check it out on GitHub:
very excited for the palm 2 tech report to be out! it's been incredibly fun figuring out the learning rate for some of the best models in the world
...but I'm even more excited for Gemini to beat it 🚀📈🚀
This includes our new foundation model that's still in training, Gemini. It’s our first model created from the ground up to be multimodal, highly capable at different sizes, and efficient at integrating with other tools and APIs.
#GoogleIO
🎉🎉 our NeurIPS workshop on how to train neural nets has been accepted! 💯 please submit your weird tips & tricks on NN training, we can't wait to discuss them all together 😃🔥🖥️
The CfP for our
@NeurIPSConf
workshop *Has It Trained Yet* is out: .
If you train deep networks, you want to be at this workshop on December 2. And if you develop methods to train deep nets, you may want your work to be present there. Here’s why: 🧵
BREAKING: BAAI (dubbed "the OpenAI of China") launched Wudao, a 1.75 trillion parameter pretrained deep learning model (potentially the world's largest).
Wudao has 150 billion more parameters than Google's Switch Transformers, and is 10x that of GPT-3.
@Noahpinion
My heterodox take on US transit is that if infrastructure problems are too hard to solve, the transit of the future is airplanes, and we should just make airplanes better by (i) making them zero-carbon, and (ii) improving comfort by greatly cutting down airport security
detecting AI content is the next adversarial examples
tons of research will be spent on it only to come up with "defenses" that are broken within 1 day of publication
AI work is ultimately undetectable, despite the recent discussion of watermarking.
AI writing is undetectable by any automated system after just a few rounds of prompting or revision
This paper shows it is also easy to defeat watermarking for AI image.
Some excellent work by
@jeankaddour
and colleagues
“We find that their training, validation, and downstream gains vanish compared to a baseline with a fully-decayed learning rate��
☠️
We train for over four epochs and experience improving performance with use of repeated tokens. For the largest 120B model, we trained for four epochs without overfitting.
Gemini and I also got a chance to watch the
@OpenAI
live announcement of gpt4o, using Project Astra! Congrats to the OpenAI team, super impressive work!
Google Search users with Search Generative Experiences (SGE) turned on will now be able to export responses to Python-related queries to a new Colab notebook directly! You can run the code, tinker with it in Colab and save the notebook for future reference!
#GoogleAI
#Colab
10am, 9th of May for an Openai event apparently, might not be model release but search engine announcement.
Guess they can’t help themselves to upstage Google I/O
( Can’t guarantee this, event times and dates can be changed )
Introducing Veo: our most capable generative video model. 🎥
It can create high-quality, 1080p clips that can go beyond 60 seconds.
From photorealism to surrealism and animation, it can tackle a range of cinematic styles. 🧵
#GoogleIO
during generation it's very impressive at how seamlessly it interleaves text/image, imo for models going forward being able to condition image generation on neighboring text is going to be important
"“You can interrogate the data sets. You can interrogate the model. You can interrogate the code of Stable Diffusion and the other things we’re doing,” he said. “And we’re seeing it being improved all the time.”"
lol you can do all of that with a controlled API too
@typedfemale
sam walks up to a sr alignment engineer: "at ease. what have you been working on here?"
"i did my phd getting robots to solve rubiks cubes without resorting to chatbots, I'm continuing that with one burnt out effective altruist stanford ugrad"
sam: "shut the entire thing down"
@bryancsk
pretty sure the issue isn't the wages but the fact they read a novel worth of disturbing content or view child porn or gore each day w/o health benefits to help with that? this is the same company as and employees still don't seem to be getting help
working in a project where we are implementing a bunch of DL workloads in pytorch and jax/flax/optax, and pytorch is not what everyone hyped it up to be!
Excited to present my first work as a PhD student at
@ANITI_Toulouse
and
@tserre
-lab at
@BrownUniversity
with Rufin VanRullen and Thomas Serre: "Neural Optimal Control for Representation Learning". Preprint
Code & Notebook to come! Read more below!
1/9
Thought I would summarise why there is so much excitement in the space weather community right now. There’s a monstrous sunspot group on the Sun that’s massive enough to be visible to the naked eye (please use eclipse glasses) 🌞 👓 (1/n)