We have a long history of supporting responsible open source & science, which can drive rapid research progress, so we’re proud to release Gemma: a set of lightweight open models, best-in-class for their size, inspired by the same tech used for Gemini
We are releasing a series of visual features that are performant across pixel and image level tasks. We achieve this by training a 1b param VIT-g on a large diverse and curated dataset with no supervision, and distill it to smaller models. Everything is open-source.
Announced by Mark Zuckerberg this morning — today we're releasing DINOv2, the first method for training computer vision models that uses self-supervised learning to achieve results matching or exceeding industry standards.
More on this new work ➡️
Super excited to share new open LLMs from FAIR with our research community. Particularly, the LLaMA-13B is competitive with GPT-3, despite being 10x smaller.
Today we release LLaMA, 4 foundation models ranging from 7B to 65B parameters.
LLaMA-13B outperforms OPT and GPT-3 175B on most benchmarks. LLaMA-65B is competitive with Chinchilla 70B and PaLM 540B.
The weights for all models are open and available at
1/n
To support innovation in computer vision, we’re making DINOv2 available under the Apache 2.0 license + releasing a collection of DINOv2-based dense prediction models for semantic image segmentation and monocular depth estimation.
Try our updated demo ➡️
Great article by
@guillaumgrallet
for
@LePoint
on the unique place of France in AI. Shout out to
@Inria
for their central role in building the foundations of this ecosystem.
Our work on learning visual features with an LLM approach is finally out. All the scaling observations made on LLMs transfer to images! It was a pleasure to work under
@alaaelnouby
leadership on this project, and this concludes my fun (but short) time at Apple! 1/n
Excited to share AIM 🎯 - a set of large-scale vision models pre-trained solely using an autoregressive objective. We share the code & checkpoints of models up to 7B params, pre-trained for 1.2T patches (5B images) achieving 84% on ImageNet with a frozen trunk.
(1/n) 🧵
So excited by the release of the open version of Griffin. The griffin team has done everything possible to help
@srush_nlp
win his bet, and now they are open sourcing a first 2B to help the community help Sasha.
Announcing RecurrentGemma!
- A 2B model with open weights based on Griffin
- Replaces transformer with mix of gated linear recurrences and local attention
- Competitive with Gemma-2B on downstream evals
- Higher throughput when sampling long sequences
IMHO, Chinchilla is the most impactful paper in the recent development of open LLMs, and its relatively low citation counts shows how much this metric is broken.
I'm a bit obsessed with the Chinchilla paper. It has the largest ratio of "economic worth/idea complexity" of any paper in AI. If Google has locked it down, it's possible open-source would be a year or more behind.
It certainly has been a fun year
@Google
: enjoy playing with our open source models Gemma, built from the same research and technology used to create the Gemini models. 💙♊️🚀
Blog:
Tech report:
Team worked hard to address the feedback from the open community to improve the model. Kudos to
@robdadashi
and colleagues for the hard work. Let us know how it is.
Congratulations to
@aidangomez
and
@cohere
for this amazing breakthrough! On the side, our Gemma IT team also pushed our model thanks to the feedback from the open community. Great day for open models!
Exciting news - the latest Arena result are out!
@cohere
's Command R+ has climbed to the 6th spot, matching GPT-4-0314 level by 13K+ human votes! It's undoubtedly the **best** open model on the leaderboard now🔥
Big congrats to
@cohere
's incredible work & valuable contribution
Command R+ (⌘ R+) is our most capable model (with open weights!) yet! I’m particularly excited about its multilingual capabilities. It should do pretty well in 10 languages (English, French, Spanish, Italian, German, Portuguese, Japanese, Korean, Arabic, and Chinese).
You can
FeatUp
A Model-Agnostic Framework for Features at Any Resolution
Deep features are a cornerstone of computer vision research, capturing image semantics and enabling the community to solve downstream tasks even in the zero- or few-shot regime. However, these features
I'm excited to share PaliGemma, an open vision-language model that can be fine-tuned within 20 minutes.
You'll be impressed by how far it goes with only batch size 8 and step 64. Try it out yourself, with your free Google Colab account and T4 GPU:
Here's my take on the Sora technical report, with a good dose of speculation that could be totally off. First of all, really appreciate the team for sharing helpful insights and design decisions – Sora is incredible and is set to transform the video generation community.
What we
There is only scale and cosine schedule and adamw with batchsize that are big but not too big and a post..not wait pre..no wait postnorm with rsmnorm and gradient clipping and RoPe with sentencepiece with no dummy whitespace on heavily preprocessed data, duh?
To all the defeatists who think there is nothing else but scale:
* 5 years between Self-Attention Is All You Need and FlashAttention
* Transformers still require warmup.
Researchers: get back to work! The future is bright :)
Introducing: Zephyr Gemma!
The community has struggled to do a good preference-tune of Gemma, so we built an open-source recipe and trained a model to help people get started.
Model:
Demo:
Handbook:
Really excited to be part of the founding team of
@kyutai_labs
: at the heart of our mission is doing open source and open science in AI🔬📖. Thanks so much to our founding donators for making this happen 🇪🇺 I’m thrilled to get to work with such a talented team and grow the lab 😊
🎉 Unveiling PaSS: Parallel Speculative Sampling
🚀 Need faster LLM decoding?
🔗 Check out our new 1-model speculative sampling algorithm based on parallel decoding with look-ahead tokens:
🤝 In collaboration with
@armandjoulin
and
@EXGRV
Just got back from vacation, and super excited to finally release Griffin - a new hybrid LLM mixing RNN layers with Local Attention - scaled up to 14B params!
My co-authors have already posted about our amazing results, so here's a 🧵on how we got there!
Gemma v1.1 is officially announced!
@robdadashi
led a strike team to fix most of the issues that the open source community found with our 2B and "7B" IT models. Kudos to them and more to come soon!
I’m excited to announce our latest paper, introducing a family of early-fusion token-in token-out (gpt4o….), models capable of interleaved text and image understanding and generation.
⌘-R
Introducing Command-R, a model focused on scalability, RAG, and Tool Use. We've also released the weights for research use, we hope they're useful to the community!
I remember
@alex_conneau
telling me about his dream of building Her only a few years ago, and here we are. Congratulations to you and the whole OpanAI team behind this achievement!
@OpenAI
#GPT4o
#Audio
Extremely excited to share the results of what I've been working on for 2 years
GPT models now natively understand audio: you can talk to the Transformer itself!
The feeling is hard to describe so I can't wait for people to speak to it
#HearTheAGI
🧵1/N
@ahatamiz1
@arimorcos
It was one of the few big points of the MLP-Mixer paper/result, to show that "at scale, any reasonable architecture will work".
We could have followed with a few more papers with a few more architectures, but it was enough and we moved on to other things.
cont.
Introducing Meta Llama 3: the most capable openly available LLM to date.
Today we’re releasing 8B & 70B models that deliver on new capabilities such as improved reasoning and set a new state-of-the-art for models of their sizes.
Today's release includes the first two Llama 3
As always, I'm amazed by the support of HF to the open community. The new member of the Pali family is out and ready to be tested! Great work from
@giffmana
and colleagues.
move to NYC.
build open models.
distribute bootleg books of model weights alongside bagels and ice cream trucks.
@srush_nlp
@kchonyc
@jefrankle
and I will be around.
Less than 24 hours after release, C4AI Command-R claims the
#1
spot on the Hugging Face leaderboard!
We launched with the goal of making generative AI breakthroughs accessible to the research community - so exciting to see such a positive response. 🔥
open data is critical for the progress of AI, and our AIM work would not have been possible without
@Vaishaal
fantastic work. Thank you for making this data available to the community.
Meet DBRX, a new sota open llm from
@databricks
. It's a 132B MoE with 36B active params trained from scratch on 12T tokens. It sets a new bar on all the standard benchmarks, and - as an MoE - inference is blazingly fast. Simply put, it's the model your data has been waiting for.
Introducing the Responsible Generative AI Toolkit!
🔨 Get tools to apply best practices for responsible use of open models such as the latest Gemma models.
📘 Get expert guidance on setting policies, tuning and evaluating your models for safety.
➡️
We’re hiring people to work with us on MLX.
If you’re interested, can write fast GPU kernels, and have machine learning experience, reach out.
More here:
They have a stellar team, so before you think they do something wrong/weird, maybe think that either you are missing something, or whatever you think is not what they trained the model for.
I have no doubt they'll do well.
Will append more to thread if I see more simple Q's.
Very excited to see the new Gemma 1.1. instruct models have just been released! They are better across the board and have addressed some important feedback from the community.
Huge congrats and thanks to all the amazing people involved!
I'm happy to share the release of gemma.cpp - a lightweight, standalone C++ inference engine for Google's Gemma models:
Have to say, it’s one of the best project experiences of my career.
AlphaCode-2 is also announced today, but seems to be buried in news. It's a competitive coding model finetuned from Gemini. In the technical report, DeepMind shares a surprising amount of details on an inference-time search, filtering, and re-ranking system. This may be Google's
@giffmana
FAIR is still home to top tier computer vision like
@imisra_
,
@lvdmaaten
, Christoph Feichtenhofer, Peter Dollar, Yaniv Taigman,
@p_bojanowski
. As
@inkynumbers
I think a lot of us joined 8-9yr ago and there are cycles in research careers.
@abacaj
We will look to improve our models in future iterations and any feedback will be appreciated (through DMs?). Mistral's models are amazing and if they work for you, all the best!
Happy to share - blah blah blah.
Gemma + Griffin = RecurrentGemma
Competitive quality with Gemma-2B and much better throughput, especially for long sequences.
Cracked model from cracked team!
Check it out below 👇
Looking forward to read the recommendations from the AI commission to the French governement. What an amazing team of diverse talents from the industry like Joelle Barral and
@arthurmensch
, and academia like
@GaelVaroquaux
and
@Isabelle_Ryl
Merci à la Commission de l’intelligence artificielle pour son rapport.
600 auditions, 7000 consultations, 25 sessions et 1 plan d’actions sur la formation, l’investissement, la puissance de calcul, l’accès aux données, la recherche publique et la gouvernance mondiale.
To get up and running with Gemma locally:
pip install -U mlx-lm
python -m mlx_lm.generate --model google/gemma-7b-it --prompt "Write a quick sort in C++"
You can also (Q)LoRA fine tune on your laptop 🚀
OMG! This is Insane!!
A 7B Model is now beating GPT 3.5 in LMSYS Chatbot Arena—a.k.a. the ONLY BENCHMARK that matters because it is based on blind human eval and can't be gamed.
Starling-7B scores on top GPT 3.5, Mistral, and Gemini Pro!! 🤯🤯
Link -
@chriswolfvision
tbh, sam was not designed for downstream tasks while dinov2 was + we did probe ade20k and inet-1k-nn intensively during the dev of dinov2 so it s not the fairest metrics to support this point.
Command R+ has strong multilingual capabilities. Its tokenizer also compresses multilingual text much better than other tokenizers. For example, in comparison the OpenAI tokenizer uses:
- 1.18x more tokens for Portuguese
- 1.54x more tokens for Chinese
- 1.67x more tokens for
Another pro-tip for doing really well on evals: just train on the test set. Literally just do it, you have the examples right there.
Ie. here's [redacted] on HumanEval.
Hey! If you are using DINOv2, whether in a startup, in research or whatever, could you send me a DM? I want your feedback on the model.
Reward for you? Simple: next model is gonna be 𝘦𝘷𝘦𝘯 𝘮𝘰𝘳𝘦 suited to your needs 👌
🤝Calling all AI enthusiasts📣
🎨We invite you to showcase Gemma 1.1 model capabilities by building demos using Gradio! We'd be happy to offer GPU grants for the early ones from the community.
2B:
7B:
📢 The
@Apple
MLR team in Paris is looking for a strong PhD intern
🔎 Topics: Representation learning at scale, Vision+Language, and multi-modal learning.
Please reach out if you're interested! You can apply here 👇
With the new release of Gemma-2B, I thought I'd see how torch.compile performs.
Gemma 2B for a single prompt runs at 144 tokens/s on a V100, a 4x increase over the uncompiled HF version.
We're working with
@huggingface
to upstream these improvements too!
Here's details on Meta's 24k H100 Cluster Pods that we use for Llama3 training.
* Network: two versions RoCEv2 or Infiniband.
* Llama3 trains on RoCEv2
* Storage: NFS/FUSE based on Tectonic/Hammerspace
* Stock PyTorch: no real modifications that aren't upstreamed
* NCCL with
I really loved my time at MLR. Samy has created an amazing research lab with a ton of fantastic researchers, but I felt that a project like Gemini was more aligned with my current goals. n/n
Deux nouvelles vice-présidentes de l’Université PSL ont été nommées le 14 mars 2024 :
Sabine Cantournet, vice-présidente formation et égalité des chances
Isabelle Ryl, vice-présidente Intelligence Artificielle
➡️ Découvrez leur profil sur notre site !
This works is another hint that confirms the intuition that we are converging across modalities and a single model may emerge as a form of AGI. I don't how far we are but I am very bullish that efforts like Gemini or GPT may get us across the line. 2/n
@_philschmid
@OpenAI
Google DeepMind, Meta FAIR,
@kyutai_labs
, ... a lot of labs have had this mission for years. If anything, they may have deviated a bit from this goal because of OAI recent successes.
Real cool new set of models from Yi. But why is the new standard for IT models to report few shots on knowledge intensive benchmarks? It feels like IT models should be evaluated at 0-shot, not few shot...
Wow! Yi just released an update on their model family - 6B, 9B, 34B - Apache 2.0 licensed! 🔥
> The 34B competes comfortably with Llama 3 70B
> Overall trained on 4.1T tokens
> Finetuned on 3M instruction tuning samples
> 34B model checkpoint beats Qwen 72B
> Both 6B and 9B beat
@SebastienBubeck
@srush_nlp
Presenting phi3 as a general llm may not be the right way to show its potential. Maybe framing it as a reasoning llm would help?
Introducing AlphaGeometry: an AI system that solves Olympiad geometry problems at a level approaching a human gold-medalist. 📐
It was trained solely on synthetic data and marks a breakthrough for AI in mathematical reasoning. 🧵
@MrCatid
@alaa_nouby
@ducha_aiki
If you need good features now -> dinov2.
If you are looking to work on the next potential breakthrough in SSL -> AIM is a good place to start.
Hard to compare the result of research on contrastive learning matured over 6 years and recent work on autoregressive loss for SSL.
@TheSeaMouse
The generation looks good but doesnt stop. My guess is thus that the api doesnt catch the eos of the model because it is set for instruct models and not base models?