Armand Joulin @armandjoulin profile

Armand Joulin

@armandjoulin

Followers

4,194

Following

355

Media

2

Statuses

281

principal researcher, @googledeepmind . ex director of emea at fair @metaai . mostly work on open projects: fasttext, dino, llama, gemma.

Joined February 2009

Don't wanna be here? Send us removal request.

Explore tweets Explore followers Explore following

Explore trending content on Musk Viewer

Bronx • 763953 Tweets

#RightPlaceWrongPerson • 586097 Tweets

#RM_LOST • 464917 Tweets

namjoon • 313832 Tweets

RCB FINISHED DHOBI • 100400 Tweets

Gaga • 95161 Tweets

Josh • 80636 Tweets

RPWP IS HERE • 78547 Tweets

Clancy • 67065 Tweets

新時代の扉 • 64485 Tweets

tyler • 62203 Tweets

Groin • 40945 Tweets

Kevin Hart • 38784 Tweets

#ウォンジョンヨヘア • 38307 Tweets

#WonjungyoHair • 37430 Tweets

twenty one pilots • 31205 Tweets

ブートヒル • 21759 Tweets

Hayırlı Cumalar • 21297 Tweets

エアコン • 20116 Tweets

高齢者の定義 • 18720 Tweets

夏アニメ化 • 18680 Tweets

スタライ • 17597 Tweets

ウマ娘の映画 • 17532 Tweets

BBL Drizzy • 16665 Tweets

メラルバ • 16172 Tweets

移民政策 • 15940 Tweets

Quiñones • 10826 Tweets

山分けキャンペーン開催中

日本一周分クーポン

スレイヤー

だめにんげん

ウルガモス

クレしんコラボ

クロススカウト

颯馬くん

古畑任三郎

フジキセキ

ホーリーウーウーボ

あと半日

マッシブーン

タキオン

キングドラ

鈴木唯人

Kabosu

かぼすちゃん

McDavid

カシオペア

アオラキ

伊東純也

#اقوي_رفع_هشتاق_Θち55ち899O3

Last Seen Profiles

@_bettydouglas_

@hkmd_artist

@huseyin0mailru1

@jeonjihoonn

@MicheleRivasi

@nowhin9ing

@H2020_Funding

@caitraft

@B_C_Numaso

@nick_tanko

@CIndustrielles

@BierStation

@Beer_Enjoyer420

@CarterEudoxie

@GuihMenegatti

@InakiDeLaTorre_

@RCWestWickham

@SouthpawShotz

@Nekromorph_

@sendCarpinchos

Pinned Tweet

Armand Joulin

@armandjoulin

3 months

We are releasing a first set of new models for the open community of developers and researchers.

Demis Hassabis

@demishassabis

3 months

We have a long history of supporting responsible open source & science, which can drive rapid research progress, so we’re proud to release Gemma: a set of lightweight open models, best-in-class for their size, inspired by the same tech used for Gemini

186

359

2K

6

17

124

Armand Joulin

@armandjoulin

1 year

We are releasing a series of visual features that are performant across pixel and image level tasks. We achieve this by training a 1b param VIT-g on a large diverse and curated dataset with no supervision, and distill it to smaller models. Everything is open-source.

AI at Meta

@AIatMeta

1 year

Announced by Mark Zuckerberg this morning — today we're releasing DINOv2, the first method for training computer vision models that uses self-supervised learning to achieve results matching or exceeding industry standards. More on this new work ➡️

92

906

4K

6

35

325

Armand Joulin

@armandjoulin

7 months

Life update: I m joining GDM Paris. Ping me if you want to chat!

23

2

217

Armand Joulin

@armandjoulin

1 year

Super excited to share new open LLMs from FAIR with our research community. Particularly, the LLaMA-13B is competitive with GPT-3, despite being 10x smaller.

Guillaume Lample @ ICLR 2024

@GuillaumeLample

1 year

Today we release LLaMA, 4 foundation models ranging from 7B to 65B parameters. LLaMA-13B outperforms OPT and GPT-3 175B on most benchmarks. LLaMA-65B is competitive with Chinchilla 70B and PaLM 540B. The weights for all models are open and available at 1/n

173

1K

7K

2

27

208

Armand Joulin

@armandjoulin

9 months

Our DINOv2 models are now under Apache 2.0 license. Thank you @MetaAI for making this change!

AI at Meta

@AIatMeta

9 months

To support innovation in computer vision, we’re making DINOv2 available under the Apache 2.0 license + releasing a collection of DINOv2-based dense prediction models for semantic image segmentation and monocular depth estimation. Try our updated demo ➡️

12

135

899

0

10

121

Armand Joulin

@armandjoulin

1 month

Fixed the fix.

Jonathan Frankle

@jefrankle

1 month

Fixed it for you, @code_star

4

8

91

6

9

116

Armand Joulin

@armandjoulin

2 months

When you realize that MLX was developed by only 4 people and you see what they achieved...

Awni Hannun

@awnihannun

2 months

MLX Swift and LLM example are updated. Generating text is faster. Get started: 4-bit Gemma 2B runs nicely on an iPhone 14:

10

37

277

3

8

112

Armand Joulin

@armandjoulin

8 months

Great article by @guillaumgrallet for @LePoint on the unique place of France in AI. Shout out to @Inria for their central role in building the foundations of this ecosystem.

2

15

80

Armand Joulin

@armandjoulin

4 months

Our work on learning visual features with an LLM approach is finally out. All the scaling observations made on LLMs transfer to images! It was a pleasure to work under @alaaelnouby leadership on this project, and this concludes my fun (but short) time at Apple! 1/n

Alaa El-Nouby

@alaa_nouby

4 months

Excited to share AIM 🎯 - a set of large-scale vision models pre-trained solely using an autoregressive objective. We share the code & checkpoints of models up to 7B params, pre-trained for 1.2T patches (5B images) achieving 84% on ImageNet with a frozen trunk. (1/n) 🧵

8

56

215

1

7

66

Armand Joulin

@armandjoulin

1 month

So excited by the release of the open version of Griffin. The griffin team has done everything possible to help @srush_nlp win his bet, and now they are open sourcing a first 2B to help the community help Sasha.

Samuel L Smith

@SamuelMLSmith

1 month

Announcing RecurrentGemma! - A 2B model with open weights based on Griffin - Replaces transformer with mix of gated linear recurrences and local attention - Competitive with Gemma-2B on downstream evals - Higher throughput when sampling long sequences

9

69

280

2

9

63

Armand Joulin

@armandjoulin

6 months

IMHO, Chinchilla is the most impactful paper in the recent development of open LLMs, and its relatively low citation counts shows how much this metric is broken.

Sasha Rush

@srush_nlp

6 months

I'm a bit obsessed with the Chinchilla paper. It has the largest ratio of "economic worth/idea complexity" of any paper in AI. If Google has locked it down, it's possible open-source would be a year or more behind.

8

19

329

3

7

56

Armand Joulin

@armandjoulin

3 months

Working with the Gemini team has been a lot of fun! Thank you @clmt @OriolVinyalsML @JeffDean @koraykv ! 💙♊️

Oriol Vinyals

@OriolVinyalsML

3 months

It certainly has been a fun year @Google : enjoy playing with our open source models Gemma, built from the same research and technology used to create the Gemini models. 💙♊️🚀 Blog: Tech report:

23

36

279

4

5

56

Armand Joulin

@armandjoulin

2 months

Team worked hard to address the feedback from the open community to improve the model. Kudos to @robdadashi and colleagues for the hard work. Let us know how it is.

Omar Sanseviero

@osanseviero

2 months

Happy Friday! New Gemma instruct model is out 🔥have a fun weekend! 🤗

6

62

393

4

12

51

Armand Joulin

@armandjoulin

2 months

Congrats to the @xai s team for this release! Almost everyone is now opensourcing and it has only been a year since LLaMA, what a turn of events.

François Chollet

@fchollet

2 months

Very nice to see Grok go open-source & open-weights:

14

108

1K

0

7

47

Armand Joulin

@armandjoulin

1 month

Congratulations to @aidangomez and @cohere for this amazing breakthrough! On the side, our Gemma IT team also pushed our model thanks to the feedback from the open community. Great day for open models!

lmsys.org

@lmsysorg

1 month

Exciting news - the latest Arena result are out! @cohere 's Command R+ has climbed to the 6th spot, matching GPT-4-0314 level by 13K+ human votes! It's undoubtedly the **best** open model on the leaderboard now🔥 Big congrats to @cohere 's incredible work & valuable contribution

43

313

1K

1

6

44

Armand Joulin

@armandjoulin

2 months

Cohere did it again

Sebastian Ruder

@seb_ruder

2 months

Command R+ (⌘ R+) is our most capable model (with open weights!) yet! I’m particularly excited about its multilingual capabilities. It should do pretty well in 10 languages (English, French, Spanish, Italian, German, Portuguese, Japanese, Korean, Arabic, and Chinese). You can

17

69

420

2

1

43

Armand Joulin

@armandjoulin

1 month

Working on RAG at Meta, open source at HF. Now both at Cohere. Looking forward to listen to this podcast.

Ola Piktus

@olapiktus

1 month

A podcast about how it's weird speaking English when you're both Polish and you know it

2

6

85

4

5

42

Armand Joulin

@armandjoulin

2 months

Very impressive upscaling of features

AK

@_akhaliq

2 months

FeatUp A Model-Agnostic Framework for Features at Any Resolution Deep features are a cornerstone of computer vision research, capturing image semantics and enabling the community to solve downstream tasks even in the zero- or few-shot regime. However, these features

5

166

930

1

7

39

Armand Joulin

@armandjoulin

8 days

PaliGemma is out and you can finetune it on Google Colab in a matter of minutes.

Xiaohua Zhai

@XiaohuaZhai

9 days

I'm excited to share PaliGemma, an open vision-language model that can be fine-tuned within 20 minutes. You'll be impressed by how far it goes with only batch size 8 and step 64. Try it out yourself, with your free Google Colab account and T4 GPU:

0

14

94

2

1

38

Armand Joulin

@armandjoulin

3 months

From rejected for lack of novelty to breakthrough in video generation in less than a year.

Saining Xie

@sainingxie

3 months

Here's my take on the Sora technical report, with a good dose of speculation that could be totally off. First of all, really appreciate the team for sharing helpful insights and design decisions – Sora is incredible and is set to transform the video generation community. What we

44

552

3K

0

2

37

Armand Joulin

@armandjoulin

1 month

There is only scale and cosine schedule and adamw with batchsize that are big but not too big and a post..not wait pre..no wait postnorm with rsmnorm and gradient clipping and RoPe with sentencepiece with no dummy whitespace on heavily preprocessed data, duh?

Gabriel Synnaeve

@syhw

1 month

To all the defeatists who think there is nothing else but scale: * 5 years between Self-Attention Is All You Need and FlashAttention * Transformers still require warmup. Researchers: get back to work! The future is bright :)

14

17

281

3

1

33

Armand Joulin

@armandjoulin

3 months

This is why I love the open community so much and will always find ways to give back to them: they help each other to build together.

Omar Sanseviero

@osanseviero

3 months

Introducing: Zephyr Gemma! The community has struggled to do a good preference-tune of Gemma, so we built an open-source recipe and trained a model to help people get started. Model: Demo: Handbook:

6

60

301

0

3

31

Armand Joulin

@armandjoulin

6 months

What a rockstar team! Thrilled to see what they will deliver!

Alexandre Défossez

@honualx

6 months

Really excited to be part of the founding team of @kyutai_labs : at the heart of our mission is doing open source and open science in AI🔬📖. Thanks so much to our founding donators for making this happen 🇪🇺 I’m thrilled to get to work with such a talented team and grow the lab 😊

13

6

212

2

0

31

Armand Joulin

@armandjoulin

6 months

Using parallel decoding to speed up inference of LLM: ✅ no need for a second model ✅ not finetuning ✅ negligible memory overhead

Giovanni Monea

@giomonea

6 months

🎉 Unveiling PaSS: Parallel Speculative Sampling 🚀 Need faster LLM decoding? 🔗 Check out our new 1-model speculative sampling algorithm based on parallel decoding with look-ahead tokens: 🤝 In collaboration with @armandjoulin and @EXGRV

7

10

51

0

5

31

Armand Joulin

@armandjoulin

2 months

Always wonder how to scale an RNN? Spoiler alert: simple ideas that scale and attention to details.

Soham De

@sohamde_

2 months

Just got back from vacation, and super excited to finally release Griffin - a new hybrid LLM mixing RNN layers with Local Attention - scaled up to 14B params! My co-authors have already posted about our amazing results, so here's a 🧵on how we got there!

12

62

317

1

5

29

Armand Joulin

@armandjoulin

2 months

Gemma v1.1 is officially announced! @robdadashi led a strike team to fix most of the issues that the open source community found with our 2B and "7B" IT models. Kudos to them and more to come soon!

Robert Dadashi

@robdadashi

2 months

I am very happy to announce that Gemma 1.1 Instruct 2B and “7B” are out! Here are a few details about the new models: 1/11

13

71

376

0

7

27

Armand Joulin

@armandjoulin

7 days

Very elegant solution to multimodal llm. I am impressed by the performance despite using a relatively small image token dictionary.

Armen Aghajanyan

@ArmenAgha

7 days

I’m excited to announce our latest paper, introducing a family of early-fusion token-in token-out (gpt4o….), models capable of interleaved text and image understanding and generation.

35

151

1K

3

29

Armand Joulin

@armandjoulin

2 months

More weights for the research community!

Aidan Gomez

@aidangomez

2 months

⌘-R Introducing Command-R, a model focused on scalability, RAG, and Tool Use. We've also released the weights for research use, we hope they're useful to the community!

31

195

1K

0

2

28

Armand Joulin

@armandjoulin

2 months

Not even announced, already on MLX...

Harsha

@ShellZero

2 months

Gemma-1.1-7B-it running on MLX :) @awnihannun #mlx

0

2

15

4

1

26

Armand Joulin

@armandjoulin

10 days

I remember @alex_conneau telling me about his dream of building Her only a few years ago, and here we are. Congratulations to you and the whole OpanAI team behind this achievement!

Alexis Conneau

@alex_conneau

11 days

@OpenAI #GPT4o #Audio Extremely excited to share the results of what I've been working on for 2 years GPT models now natively understand audio: you can talk to the Transformer itself! The feeling is hard to describe so I can't wait for people to speak to it #HearTheAGI 🧵1/N

38

54

490

1

0

23

Armand Joulin

@armandjoulin

1 month

💯 MLP-mixer is perfect example of the importance of data but it is also a very elegant model... and meme.

Lucas Beyer (bl16)

@giffmana

1 month

@ahatamiz1 @arimorcos It was one of the few big points of the MLP-Mixer paper/result, to show that "at scale, any reasonable architecture will work". We could have followed with a few more papers with a few more architectures, but it was enough and we moved on to other things. cont.

2

1

31

4

1

23

Armand Joulin

@armandjoulin

1 month

Congratulations! The IT results are particularly strong, impressive!

AI at Meta

@AIatMeta

1 month

Introducing Meta Llama 3: the most capable openly available LLM to date. Today we’re releasing 8B & 70B models that deliver on new capabilities such as improved reasoning and set a new state-of-the-art for models of their sizes. Today's release includes the first two Llama 3

260

1K

6K

1

23

Armand Joulin

@armandjoulin

9 days

As always, I'm amazed by the support of HF to the open community. The new member of the Pali family is out and ready to be tested! Great work from @giffmana and colleagues.

Vaibhav (VB) Srivastav

@reach_vb

10 days

Play around with the models here:

1

3

27

0

3

21

Armand Joulin

@armandjoulin

2 months

NYC is the new Paris

Soumith Chintala

@soumithchintala

2 months

move to NYC. build open models. distribute bootleg books of model weights alongside bagels and ice cream trucks. @srush_nlp @kchonyc @jefrankle and I will be around.

8

13

255

1

2

20

Armand Joulin

@armandjoulin

2 months

Congrats @CohereForAI and welcome to the game!

Cohere For AI

@CohereForAI

2 months

Less than 24 hours after release, C4AI Command-R claims the #1 spot on the Hugging Face leaderboard! We launched with the goal of making generative AI breakthroughs accessible to the research community - so exciting to see such a positive response. 🔥

2

21

137

1

20

Armand Joulin

@armandjoulin

4 months

open data is critical for the progress of AI, and our AIM work would not have been possible without @Vaishaal fantastic work. Thank you for making this data available to the community.

Vaishaal Shankar

@Vaishaal

4 months

@GuhMother @_akhaliq We did release it, sorry its a bit buried but its here:

0

2

20

2

1

19

Armand Joulin

@armandjoulin

9 days

Efficient model + long context + multimodal. Amazing update from Gemini!

koray kavukcuoglu

@koraykv

9 days

Today we unveiled Gemini 1.5 Flash! Designed for fast and cost-efficient serving at scale, with multimodal reasoning and breakthrough long context.

3

26

160

1

18

Armand Joulin

@armandjoulin

2 months

Another open source model! It seems that even large models are being open now, I m looking forward to how this will help the open community.

Jonathan Frankle

@jefrankle

2 months

Meet DBRX, a new sota open llm from @databricks . It's a 132B MoE with 36B active params trained from scratch on 12T tokens. It sets a new bar on all the standard benchmarks, and - as an MoE - inference is blazingly fast. Simply put, it's the model your data has been waiting for.

34

267

1K

2

1

18

Armand Joulin

@armandjoulin

3 months

Help build safeguards for your projects when using open LLMs, like Gemma.

Google for Developers

@googledevs

3 months

Introducing the Responsible Generative AI Toolkit! 🔨 Get tools to apply best practices for responsible use of open models such as the latest Gemma models. 📘 Get expert guidance on setting policies, tuning and evaluating your models for safety. ➡️

33

81

433

0

4

16

Armand Joulin

@armandjoulin

2 months

Imagine what this team could do if they were one more...

Awni Hannun

@awnihannun

2 months

We’re hiring people to work with us on MLX. If you’re interested, can write fast GPU kernels, and have machine learning experience, reach out. More here:

19

94

774

0

2

14

Armand Joulin

@armandjoulin

2 months

That ⬇️ + If this model is not for you, just use another open model. That s the beauty of it.

Lucas Beyer (bl16)

@giffmana

2 months

They have a stellar team, so before you think they do something wrong/weird, maybe think that either you are missing something, or whatever you think is not what they trained the model for. I have no doubt they'll do well. Will append more to thread if I see more simple Q's.

3

2

115

1

0

14

Armand Joulin

@armandjoulin

1 month

@SashaMTL It was registered for a Shasha Lucciono.

1

0

12

Armand Joulin

@armandjoulin

2 months

The improvements of v1.1 wouldn't be possible without the creativity of @piergsessa

Pier Giuseppe Sessa

@piergsessa

2 months

Very excited to see the new Gemma 1.1. instruct models have just been released! They are better across the board and have addressed some important feedback from the community. Huge congrats and thanks to all the amazing people involved!

2

4

15

0

2

12

Armand Joulin

@armandjoulin

3 months

A new C++ inference engine for llms

Austin Huang

@austinvhuang

3 months

I'm happy to share the release of gemma.cpp - a lightweight, standalone C++ inference engine for Google's Gemma models: Have to say, it’s one of the best project experiences of my career.

22

197

1K

0

2

11

Armand Joulin

@armandjoulin

3 months

The MLX community is honestly one of the most impressive atm.

Awni Hannun

@awnihannun

3 months

The 🤗 MLX community is fast. Already quantized and uploaded all the Gemma model variants: Available here: Thanks @Prince_Canuma and @lazarustda !

6

17

200

0

2

12

Armand Joulin

@armandjoulin

6 months

Great work from the team led by @RemiLeblond on competitive code competition!

Jim Fan

@DrJimFan

6 months

AlphaCode-2 is also announced today, but seems to be buried in news. It's a competitive coding model finetuned from Gemini. In the technical report, DeepMind shares a surprising amount of details on an inference-time search, filtering, and re-ranking system. This may be Google's

31

264

1K

0

1

11

Armand Joulin

@armandjoulin

6 months

@giffmana FAIR is still home to top tier computer vision like @imisra_ , @lvdmaaten , Christoph Feichtenhofer, Peter Dollar, Yaniv Taigman, @p_bojanowski . As @inkynumbers I think a lot of us joined 8-9yr ago and there are cycles in research careers.

0

11

Armand Joulin

@armandjoulin

3 months

@abacaj We will look to improve our models in future iterations and any feedback will be appreciated (through DMs?). Mistral's models are amazing and if they work for you, all the best!

1

11

Armand Joulin

@armandjoulin

3 months

@deliprao @arankomatsuzaki Along with BERT, T5 ( @colinraffel , @ada_rob @sharan0909 among others) from Google has also played a key role in the transformer revolution. It has been widely used in research and is still so underrated to this day imho.

1

0

10

Armand Joulin

@armandjoulin

1 month

Blah blah blah repost

Mihir Kale

@maninblack815

1 month

Happy to share - blah blah blah. Gemma + Griffin = RecurrentGemma Competitive quality with Gemma-2B and much better throughput, especially for long sequences. Cracked model from cracked team! Check it out below 👇

2

10

57

3

2

10

Armand Joulin

@armandjoulin

2 months

Looking forward to read the recommendations from the AI commission to the French governement. What an amazing team of diverse talents from the industry like Joelle Barral and @arthurmensch , and academia like @GaelVaroquaux and @Isabelle_Ryl

Emmanuel Macron

@EmmanuelMacron

2 months

Merci à la Commission de l’intelligence artificielle pour son rapport. 600 auditions, 7000 consultations, 25 sessions et 1 plan d’actions sur la formation, l’investissement, la puissance de calcul, l’accès aux données, la recherche publique et la gouvernance mondiale.

174

130

614

1

10

Armand Joulin

@armandjoulin

3 months

Amazing how fast is @awnihannun

Awni Hannun

@awnihannun

3 months

To get up and running with Gemma locally: pip install -U mlx-lm python -m mlx_lm.generate --model google/gemma-7b-it --prompt "Write a quick sort in C++" You can also (Q)LoRA fine tune on your laptop 🚀

7

49

274

0

9

Armand Joulin

@armandjoulin

2 months

Impressive! I wonder how much using lmsys-1M dataset is hacking the system though

Bindu Reddy

@bindureddy

2 months

OMG! This is Insane!! A 7B Model is now beating GPT 3.5 in LMSYS Chatbot Arena—a.k.a. the ONLY BENCHMARK that matters because it is based on blind human eval and can't be gamed. Starling-7B scores on top GPT 3.5, Mistral, and Gemini Pro!! 🤯🤯 Link -

32

160

909

3

1

8

Armand Joulin

@armandjoulin

5 months

@chriswolfvision tbh, sam was not designed for downstream tasks while dinov2 was + we did probe ade20k and inet-1k-nn intensively during the dev of dinov2 so it s not the fairest metrics to support this point.

0

9

Armand Joulin

@armandjoulin

1 month

Great thread about the advantage of large multilingual vocabulary

Sebastian Ruder

@seb_ruder

1 month

Command R+ has strong multilingual capabilities. Its tokenizer also compresses multilingual text much better than other tokenizers. For example, in comparison the OpenAI tokenizer uses: - 1.18x more tokens for Portuguese - 1.54x more tokens for Chinese - 1.67x more tokens for

5

36

251

1

9

Armand Joulin

@armandjoulin

6 months

@ylecun @inkynumbers @sainingxie @georgiagkioxari @AlexDefosse Worth mentioning that @EXGRV also went to Kyutai and Kaiming He to MIT. It is the cycle if a lab.

2

1

8

Armand Joulin

@armandjoulin

1 month

@jefrankle I need to work on my meme game

1

0

8

Armand Joulin

@armandjoulin

2 months

...or train on all of the json files stored on Github. Free test sets ftw!

Aidan Gomez

@aidangomez

2 months

Another pro-tip for doing really well on evals: just train on the test set. Literally just do it, you have the examples right there. Ie. here's [redacted] on HumanEval.

24

34

377

0

8

Armand Joulin

@armandjoulin

2 months

Great initiative from Tim! Please let me know too 😁

TimDarcet

@TimDarcet

2 months

Hey! If you are using DINOv2, whether in a startup, in research or whatever, could you send me a DM? I want your feedback on the model. Reward for you? Simple: next model is gonna be 𝘦𝘷𝘦𝘯 𝘮𝘰𝘳𝘦 suited to your needs 👌

9

13

135

0

1

8

Armand Joulin

@armandjoulin

1 month

2B and "7B" on gradio

Gradio

@Gradio

2 months

🤝Calling all AI enthusiasts📣 🎨We invite you to showcase Gemma 1.1 model capabilities by building demos using Gradio! We'd be happy to offer GPU grants for the early ones from the community. 2B: 7B:

1

4

10

0

2

7

Armand Joulin

@armandjoulin

5 months

Working with @alaa_nouby is amazing, this is a great opportunity.

Alaa El-Nouby

@alaa_nouby

5 months

📢 The @Apple MLR team in Paris is looking for a strong PhD intern 🔎 Topics: Representation learning at scale, Vision+Language, and multi-modal learning. Please reach out if you're interested! You can apply here 👇

3

30

89

1

0

8

Armand Joulin

@armandjoulin

3 months

It was amazing to work with Laurent and learn from his immense expertize in small powerful LLMs!

Laurent Sifre

@laurentsifre

3 months

It was great fun to work with the Gemma team on building these small open source language models! Congratulations to everyone involved! ♊️💎

0

2

36

0

7

Armand Joulin

@armandjoulin

3 months

Impressive speed on a V100!

Horace He

@cHHillee

3 months

With the new release of Gemma-2B, I thought I'd see how torch.compile performs. Gemma 2B for a single prompt runs at 144 tokens/s on a V100, a 4x increase over the uncompiled HF version. We're working with @huggingface to upstream these improvements too!

9

25

246

0

7

Armand Joulin

@armandjoulin

2 months

Very impressive! Excited to see what they are cooking with such powerhouse 🚀

Soumith Chintala

@soumithchintala

2 months

Here's details on Meta's 24k H100 Cluster Pods that we use for Llama3 training. * Network: two versions RoCEv2 or Infiniband. * Llama3 trains on RoCEv2 * Storage: NFS/FUSE based on Tectonic/Hammerspace * Stock PyTorch: no real modifications that aren't upstreamed * NCCL with

91

202

1K

0

7

Armand Joulin

@armandjoulin

4 months

I really loved my time at MLR. Samy has created an amazing research lab with a ton of fantastic researchers, but I felt that a project like Gemini was more aligned with my current goals. n/n

0

7

Armand Joulin

@armandjoulin

3 months

@soumithchintala Game engine, nerf based videos and existing games own by MSFT is my guess

1

0

6

Armand Joulin

@armandjoulin

2 months

@awnihannun ports in MLX at 1.56 models/sec

Awni Hannun

@awnihannun

2 months

4-bit quantized DBRX runs nicely in MLX on an M2 Ultra. PR:

29

112

737

1

6

Armand Joulin

@armandjoulin

6 months

@OriolVinyalsML Congratulations @OriolVinyalsML on leading the team to this massive milestone!

1

0

5

Armand Joulin

@armandjoulin

2 months

Amazing recruitment from PSL University to lead their AI department. Congratulations @Isabelle_Ryl !

Université PSL

@psl_univ

2 months

Deux nouvelles vice-présidentes de l’Université PSL ont été nommées le 14 mars 2024 : Sabine Cantournet, vice-présidente formation et égalité des chances Isabelle Ryl, vice-présidente Intelligence Artificielle ➡️ Découvrez leur profil sur notre site !

0

2

7

0

5

Armand Joulin

@armandjoulin

2 months

@xlr8harder We changed the licence too

2

0

5

Armand Joulin

@armandjoulin

4 months

This works is another hint that confirms the intuition that we are converging across modalities and a single model may emerge as a form of AGI. I don't how far we are but I am very bullish that efforts like Gemini or GPT may get us across the line. 2/n

1

0

5

Armand Joulin

@armandjoulin

2 months

...starting with Gemma v1.1! (not present on this picture 😭)

Omar Sanseviero

@osanseviero

2 months

A new week, ready to open source. Send hugs! 🤗

10

14

187

1

4

Armand Joulin

@armandjoulin

3 months

@arimorcos @sarahcat21 @dauber @AmplifyPartners @datologyai Congratulations! Looking forward to your new adventure. Data curation is a timely problem

0

4

Armand Joulin

@armandjoulin

2 months

@giffmana @fouriergalois @fly51fly It s actually SwaV that introduced it

2

0

4

Armand Joulin

@armandjoulin

11 days

@_philschmid @OpenAI Google DeepMind, Meta FAIR, @kyutai_labs , ... a lot of labs have had this mission for years. If anything, they may have deviated a bit from this goal because of OAI recent successes.

0

4

Armand Joulin

@armandjoulin

6 months

@jaltma I feel sadness today. Deep sadness. For @sama , for the field and for the world.

0

4

Armand Joulin

@armandjoulin

11 days

Real cool new set of models from Yi. But why is the new standard for IT models to report few shots on knowledge intensive benchmarks? It feels like IT models should be evaluated at 0-shot, not few shot...

Vaibhav (VB) Srivastav

@reach_vb

12 days

Wow! Yi just released an update on their model family - 6B, 9B, 34B - Apache 2.0 licensed! 🔥 > The 34B competes comfortably with Llama 3 70B > Overall trained on 4.1T tokens > Finetuned on 3M instruction tuning samples > 34B model checkpoint beats Qwen 72B > Both 6B and 9B beat

8

52

329

1

4

Armand Joulin

@armandjoulin

5 months

@soumithchintala Because it s a way to make it looks like a big achievement, especially when there is little that really stands out.

1

0

4

Armand Joulin

@armandjoulin

28 days

@SebastienBubeck @srush_nlp Presenting phi3 as a general llm may not be the right way to show its potential. Maybe framing it as a reasoning llm would help?

1

0

4

Armand Joulin

@armandjoulin

1 month

@AravSrinivas No Gemma? Interesting

0

3

Armand Joulin

@armandjoulin

1 month

@maximevoisin_ai It sounds absolutely amazing but I love working at GDM and on Gemma too much

0

3

Armand Joulin

@armandjoulin

4 months

Impressive milestone and I really want to see the limit of this combination of synthetic data+DL+symbolic method now.

Google DeepMind

@GoogleDeepMind

4 months

Introducing AlphaGeometry: an AI system that solves Olympiad geometry problems at a level approaching a human gold-medalist. 📐 It was trained solely on synthetic data and marks a breakthrough for AI in mathematical reasoning. 🧵

127

1K

4K

1

0

3

Armand Joulin

@armandjoulin

7 months

@giffmana @ylecun @syhw @boazbaraktcs Bicycle I meant.

Most People Don't Know How Bikes Work

Why are bicycles stable? The most common answer is gyroscopic effects, but this is not right. This video was sponsored by Kiwico. Get 50% off your first mont...

www.youtube.com

1

0

2

Armand Joulin

@armandjoulin

1 month

@Thom_Wolf @ylecun That's not a problem, it's the solution

1

0

3

Armand Joulin

@armandjoulin

6 months

@MinqiJiang @GoogleDeepMind @egrefen @SianGooding @UCL_DARK Welcome!

0

3

Armand Joulin

@armandjoulin

4 months

@MrCatid @alaa_nouby @ducha_aiki If you need good features now -> dinov2. If you are looking to work on the next potential breakthrough in SSL -> AIM is a good place to start. Hard to compare the result of research on contrastive learning matured over 6 years and recent work on autoregressive loss for SSL.

0

3

Armand Joulin

@armandjoulin

6 months

@julien_c It was GoT all the way

1

0

3

Armand Joulin

@armandjoulin

2 months

@xl_nlp @melbayad @thoma_gu @MichaelAuli Our work is also inspired by stochastic depth, and show the potential of this approach for layer prubing.

Deep Networks with Stochastic Depth

Very deep convolutional networks with hundreds of layers have led to significant reductions in error on competitive benchmarks. Although the unmatched expressiveness of the many layers can be...

arxiv.org

0

2

Armand Joulin

@armandjoulin

3 months

@BenTheEgg Congratulations!

0

3

Armand Joulin

@armandjoulin

1 month

@agihippo Hbd! Fun fact you re born the same day as a famous AI researcher, called Yi Tay.

2

0

3

Armand Joulin

@armandjoulin

1 month

@maximevoisin_ai @EmmanuelMacron 🤣

0

2

Armand Joulin

@armandjoulin

6 months

@natschluter Congratluations!

1

0

2

Armand Joulin

@armandjoulin

2 months

@giffmana It is trivial, and I love trivial useful stuff a lot.

0

2

Armand Joulin

@armandjoulin

6 months

@sama Your role in where is AI has been massive. Thank you from bringing us collectively to this place.

0

2

Armand Joulin

@armandjoulin

4 days

@SashaMTL I've never watched the movie so that I can lie to myself that there is still some new FF material for me to watch in case of emergency.

1

0

2

Armand Joulin

@armandjoulin

2 months

@TokenShifter @giffmana It is not intend for, but there are a lot of people that will be able to study and use this model for their application.

0

1

Armand Joulin

@armandjoulin

2 months

@giffmana @fouriergalois @fly51fly And Schmidhuber.

1

0

2

Armand Joulin

@armandjoulin

1 month

@giffmana I ve never seen a more memorable LM meme. It will be your legacy when people will have long forgotten ViTs and switched to ViGriffin.

0

2

Armand Joulin

@armandjoulin

1 month

@TheSeaMouse The generation looks good but doesnt stop. My guess is thus that the api doesnt catch the eos of the model because it is set for instruct models and not base models?

2

0

2

Armand Joulin

@armandjoulin

3 months

@arimorcos @datologyai Congratulations!!

0

2