DonkeyShot21 Profile Banner
Enrico Fini Profile
Enrico Fini

@DonkeyShot21

Followers
1K
Following
785
Media
30
Statuses
229

Member of Technical Staff @MicrosoftAI | Previously RS @Apple MLR, Intern @MetaAI & @amazon | AIMv2, solo-learn, continual pre-training

Zurich, Switzerland
Joined July 2011
Don't wanna be here? Send us removal request.
@DonkeyShot21
Enrico Fini
11 months
We release AIMv2, the second iteration of the AIM family of large autoregressive vision encoders. This time we bring multimodality into the game 🔥 Paper: https://t.co/YpU6T8Pr9p Repo: https://t.co/g1LO5rE5Y0 Model Gallery: https://t.co/j3jZ8TEtf5
6
36
169
@victorturrisi
Victor Turrisi
26 days
Super excited to share l3m 🚀, a library for training large multimodal models, which we used to build AIM and AIMv2. Massive thanks to @alaa_nouby @DonkeyShot21 Michal Klein @MustafaShukor1 @jmsusskind and many others.
1
16
53
@k44yej
Jonny Kaye
1 month
Microsoft AI are hiring for Early in Career talent. If you’ve published at any top conferences such as ICLR, neurIPS etc and are working in pre, post training or Multimodal and want to build the worlds most advanced frontier models, DM me and let’s chat! #hiring #microsoftai
2
1
11
@mustafasuleyman
Mustafa Suleyman
2 months
Introducing MAI-Voice-1 - most expressive, natural voice generation model I've ever used (might be a bit biased) - super efficient, generating a minute of audio in <1 second on a single GPU - live now in Copilot Daily + Podcasts Try it in Copilot Labs too:
Tweet card summary image
copilot.microsoft.com
Explore Copilot Labs - Microsoft's hub for experimental AI. Try bold AI experiments, co-create with the community, and help shape the future of Copilot
10
11
173
@arena
lmarena.ai
2 months
🚨Text Leaderboard Update: A new model provider, @MicrosoftAI has broken into the Top 15 this week! 💠MAI-1-preview by @MicrosoftAI debuts at #13. Congrats to the Microsoft AI team! As the Text Arena is one of the most competitive races, breaking into the Top 15 is no small
@mustafasuleyman
Mustafa Suleyman
2 months
Introducing MAI-1-preview - our first foundation model trained end to end in house - in public testing on LMArena - we’re excited to be actively spinning the flywheel to deliver improved models
19
44
309
@CSProfKGD
Kosta Derpanis
2 months
Some Friday afternoon reading
10
15
269
@MustafaShukor1
Mustafa Shukor
2 months
Our work on scaling laws for multimodal models and MoEs got an Oral at ICCV. Check it out !
@MustafaShukor1
Mustafa Shukor
6 months
We release a large scale study to answer the following: - Is late fusion inherently better than early fusion for multimodal models? - How do native multimodal models scale compared to LLMs. - How sparsity (MoEs) can play a detrimental role in handling heterogeneous modalities? 🧵
2
21
141
@_akhaliq
AK
3 months
Scaling Laws for Optimal Data Mixtures
1
6
43
@MustafaShukor1
Mustafa Shukor
3 months
We propose new scaling laws that predict the optimal data mixture, for pretraining LLMs, native multimodal models and large vision encoders ! Only running small-scale experiments is needed, and we can then extrapolate to large-scale ones. These laws allow 1/n 🧵
6
49
267
@altndrr
Alessandro Conti
4 months
What if we stopped treating image classification like a multiple-choice quiz… …and just asked the model: "What’s in this image?" Our paper on open-world classification with LMMs got into #ICCV2025! 🎉🌺 Let’s talk failures, insights, and flipping mistakes 👇
3
6
24
@DonkeyShot21
Enrico Fini
4 months
Career update: today I joined Microsoft AI in Zurich 🇨🇭 as a Member of Technical Staff. I’m going to miss my friends and colleagues at Apple MLR, but I’m excited for this new opportunity. LFG 🚀
6
2
215
@anara
Anara
6 months
Apple just broke the scaling laws for image models. Imagine creating Ghibli art, but 10x faster.
21
58
863
@_akhaliq
AK
6 months
Apple just dropped Scaling Laws for Native Multimodal Models Scaling Laws for Native Multimodal Models
10
54
274
@DonkeyShot21
Enrico Fini
6 months
Training and scaling large multimodal models from scratch? This is the thread for you. In this new paper, we provide an extensive study with hundreds of runs, fitting scaling laws for early/late fusion models, MoEs, and exploring different data mixtures. Tons of cool findings.
@MustafaShukor1
Mustafa Shukor
6 months
We release a large scale study to answer the following: - Is late fusion inherently better than early fusion for multimodal models? - How do native multimodal models scale compared to LLMs. - How sparsity (MoEs) can play a detrimental role in handling heterogeneous modalities? 🧵
3
11
94
@arankomatsuzaki
Aran Komatsuzaki
6 months
Scaling Laws for Native Multimodal Models - Early-fusion exhibits stronger perf at lower param counts, is more efficient to train, and is easier to deploy, compared w/ late fusion. - Incorporating MoEs allows for models that learn modality-specific weights, significantly
4
78
461
@afshin_dn
Afshin Dehghan
7 months
Excited to share that we have recently released the source code for FlexTok, bringing a fresh perspective to tokenization. Code on GitHub: https://t.co/ApWNbE2ZO6. Project Page: https://t.co/MlDKYAfSLz #FlexTok #Tokenization #MachineLearning #MLResearch #OpenSource #AI
0
7
37
@DonkeyShot21
Enrico Fini
7 months
We are having a "Where is Molmo? Where is Qwen?" moment in computer vision
@DavidJFan
David Fan
7 months
Can visual SSL match CLIP on VQA? Yes! We show with controlled experiments that visual SSL can be competitive even on OCR/Chart VQA, as demonstrated by our new Web-SSL model family (1B-7B params) which is trained purely on web images – without any language supervision.
0
2
30
@cloneofsimo
Simo Ryu
7 months
FlexTok is pretty novel dynamic length image tokenizer, I will be speedrunning training one today (8:30 AM EST) at https://t.co/XNJ9147oCB, which is roughly in 3 hours
13
34
425
@alaa_nouby
Alaa El-Nouby
8 months
Happy to see the Ovis2 multimodal LLMs leveraging our AIMv2 encoders and achieving impressive results, congrats to the team at @AI_AlibabaInt!
@AI_AlibabaInt
AI at Alibaba International
8 months
Ovis2-34B has achieved remarkable results on the multimodal leaderboards! 🏆 #1 in open-source MLLMs - Multimodal Reasoning (47.9) 📊 #2 in open-source MLLMs - Academic (76.5) Thanks to everyone who contributed to this achievement🎉 #AI #MachineLearning #MLLM
0
3
27
@DonkeyShot21
Enrico Fini
8 months
Check out what the team has been cooking! 🍳🔥 Awesome work lead by @roman__bachmann @JRAllardice @dmizrahi_
@roman__bachmann
Roman Bachmann
8 months
Have you ever been bothered by the constraints of fixed-sized 2D-grid tokenizers? We present FlexTok, a flexible-length 1D tokenizer that enables autoregressive models to describe images in a coarse-to-fine manner. https://t.co/17oJKymhPl https://t.co/5vSqDxjwFN 🧵 1/n
0
0
8
@danbusbridge
Dan Busbridge
8 months
Reading "Distilling Knowledge in a Neural Network" left me fascinated and wondering: "If I want a small, capable model, should I distill from a more powerful model, or train from scratch?" Our distillation scaling law shows, well, it's complicated... 🧵 https://t.co/b1uuyJwzRF
Tweet card summary image
arxiv.org
We propose a distillation scaling law that estimates distilled model performance based on a compute budget and its allocation between the student and teacher. Our findings mitigate the risks...
12
150
1K