Enrico Fini @DonkeyShot21 X Profile

Enrico Fini

@DonkeyShot21

Followers

1K

Following

785

Media

30

Statuses

229

Member of Technical Staff @MicrosoftAI | Previously RS @Apple MLR, Intern @MetaAI & @amazon | AIMv2, solo-learn, continual pre-training

Zurich, Switzerland

Joined July 2011

Don't wanna be here? Send us removal request.

Enrico Fini

@DonkeyShot21

11 months

We release AIMv2, the second iteration of the AIM family of large autoregressive vision encoders. This time we bring multimodality into the game 🔥 Paper: https://t.co/YpU6T8Pr9p Repo: https://t.co/g1LO5rE5Y0 Model Gallery: https://t.co/j3jZ8TEtf5

6

36

169

Victor Turrisi

@victorturrisi

26 days

Super excited to share l3m 🚀, a library for training large multimodal models, which we used to build AIM and AIMv2. Massive thanks to @alaa_nouby @DonkeyShot21 Michal Klein @MustafaShukor1 @jmsusskind and many others.

1

16

53

Jonny Kaye

@k44yej

1 month

Microsoft AI are hiring for Early in Career talent. If you’ve published at any top conferences such as ICLR, neurIPS etc and are working in pre, post training or Multimodal and want to build the worlds most advanced frontier models, DM me and let’s chat! #hiring #microsoftai

2

1

11

Mustafa Suleyman

@mustafasuleyman

2 months

Introducing MAI-Voice-1 - most expressive, natural voice generation model I've ever used (might be a bit biased) - super efficient, generating a minute of audio in <1 second on a single GPU - live now in Copilot Daily + Podcasts Try it in Copilot Labs too:

copilot.microsoft.com

Explore Copilot Labs - Microsoft's hub for experimental AI. Try bold AI experiments, co-create with the community, and help shape the future of Copilot

10

11

173

lmarena.ai

@arena

2 months

🚨Text Leaderboard Update: A new model provider, @MicrosoftAI has broken into the Top 15 this week! 💠MAI-1-preview by @MicrosoftAI debuts at #13. Congrats to the Microsoft AI team! As the Text Arena is one of the most competitive races, breaking into the Top 15 is no small

Mustafa Suleyman

@mustafasuleyman

2 months

Introducing MAI-1-preview - our first foundation model trained end to end in house - in public testing on LMArena - we’re excited to be actively spinning the flywheel to deliver improved models

19

44

309

Kosta Derpanis

@CSProfKGD

2 months

Some Friday afternoon reading

10

15

269

Mustafa Shukor

@MustafaShukor1

2 months

Our work on scaling laws for multimodal models and MoEs got an Oral at ICCV. Check it out !

Mustafa Shukor

@MustafaShukor1

6 months

We release a large scale study to answer the following: - Is late fusion inherently better than early fusion for multimodal models? - How do native multimodal models scale compared to LLMs. - How sparsity (MoEs) can play a detrimental role in handling heterogeneous modalities? 🧵

2

21

141

AK

@_akhaliq

3 months

Scaling Laws for Optimal Data Mixtures

1

6

43

Mustafa Shukor

@MustafaShukor1

3 months

We propose new scaling laws that predict the optimal data mixture, for pretraining LLMs, native multimodal models and large vision encoders ! Only running small-scale experiments is needed, and we can then extrapolate to large-scale ones. These laws allow 1/n 🧵

6

49

267

Alessandro Conti

@altndrr

4 months

What if we stopped treating image classification like a multiple-choice quiz… …and just asked the model: "What’s in this image?" Our paper on open-world classification with LMMs got into #ICCV2025! 🎉🌺 Let’s talk failures, insights, and flipping mistakes 👇

3

6

24

Enrico Fini

@DonkeyShot21

4 months

Career update: today I joined Microsoft AI in Zurich 🇨🇭 as a Member of Technical Staff. I’m going to miss my friends and colleagues at Apple MLR, but I’m excited for this new opportunity. LFG 🚀

6

2

215

Anara

@anara

6 months

Apple just broke the scaling laws for image models. Imagine creating Ghibli art, but 10x faster.

21

58

863

AK

@_akhaliq

6 months

Apple just dropped Scaling Laws for Native Multimodal Models Scaling Laws for Native Multimodal Models

10

54

274

Enrico Fini

@DonkeyShot21

6 months

Training and scaling large multimodal models from scratch? This is the thread for you. In this new paper, we provide an extensive study with hundreds of runs, fitting scaling laws for early/late fusion models, MoEs, and exploring different data mixtures. Tons of cool findings.

Mustafa Shukor

@MustafaShukor1

6 months

We release a large scale study to answer the following: - Is late fusion inherently better than early fusion for multimodal models? - How do native multimodal models scale compared to LLMs. - How sparsity (MoEs) can play a detrimental role in handling heterogeneous modalities? 🧵

3

11

94

Aran Komatsuzaki

@arankomatsuzaki

6 months

Scaling Laws for Native Multimodal Models - Early-fusion exhibits stronger perf at lower param counts, is more efficient to train, and is easier to deploy, compared w/ late fusion. - Incorporating MoEs allows for models that learn modality-specific weights, significantly

4

78

461

Afshin Dehghan

@afshin_dn

7 months

Excited to share that we have recently released the source code for FlexTok, bringing a fresh perspective to tokenization. Code on GitHub: https://t.co/ApWNbE2ZO6. Project Page: https://t.co/MlDKYAfSLz #FlexTok #Tokenization #MachineLearning #MLResearch #OpenSource #AI

0

7

37

Enrico Fini

@DonkeyShot21

7 months

We are having a "Where is Molmo? Where is Qwen?" moment in computer vision

David Fan

@DavidJFan

7 months

Can visual SSL match CLIP on VQA? Yes! We show with controlled experiments that visual SSL can be competitive even on OCR/Chart VQA, as demonstrated by our new Web-SSL model family (1B-7B params) which is trained purely on web images – without any language supervision.

0

2

30

Simo Ryu

@cloneofsimo

7 months

FlexTok is pretty novel dynamic length image tokenizer, I will be speedrunning training one today (8:30 AM EST) at https://t.co/XNJ9147oCB, which is roughly in 3 hours

13

34

425

Alaa El-Nouby

@alaa_nouby

8 months

Happy to see the Ovis2 multimodal LLMs leveraging our AIMv2 encoders and achieving impressive results, congrats to the team at @AI_AlibabaInt!

AI at Alibaba International

@AI_AlibabaInt

8 months

Ovis2-34B has achieved remarkable results on the multimodal leaderboards! 🏆 #1 in open-source MLLMs - Multimodal Reasoning (47.9) 📊 #2 in open-source MLLMs - Academic (76.5) Thanks to everyone who contributed to this achievement🎉 #AI #MachineLearning #MLLM

0

3

27

Enrico Fini

@DonkeyShot21

8 months

Check out what the team has been cooking! 🍳🔥 Awesome work lead by @roman__bachmann @JRAllardice @dmizrahi_

Roman Bachmann

@roman__bachmann

8 months

Have you ever been bothered by the constraints of fixed-sized 2D-grid tokenizers? We present FlexTok, a flexible-length 1D tokenizer that enables autoregressive models to describe images in a coarse-to-fine manner. https://t.co/17oJKymhPl https://t.co/5vSqDxjwFN 🧵 1/n

0

8

Dan Busbridge

@danbusbridge

8 months

Reading "Distilling Knowledge in a Neural Network" left me fascinated and wondering: "If I want a small, capable model, should I distill from a more powerful model, or train from scratch?" Our distillation scaling law shows, well, it's complicated... 🧵 https://t.co/b1uuyJwzRF

arxiv.org

We propose a distillation scaling law that estimates distilled model performance based on a compute budget and its allocation between the student and teacher. Our findings mitigate the risks...

12

150

1K