Our work on learning visual features with an LLM approach is finally out. All the scaling observations made on LLMs transfer to images! It was a pleasure to work under
@alaaelnouby
leadership on this project, and this concludes my fun (but short) time at Apple! 1/n
Excited to share AIM 🎯 - a set of large-scale vision models pre-trained solely using an autoregressive objective. We share the code & checkpoints of models up to 7B params, pre-trained for 1.2T patches (5B images) achieving 84% on ImageNet with a frozen trunk.
(1/n) 🧵
This works is another hint that confirms the intuition that we are converging across modalities and a single model may emerge as a form of AGI. I don't how far we are but I am very bullish that efforts like Gemini or GPT may get us across the line. 2/n
I really loved my time at MLR. Samy has created an amazing research lab with a ton of fantastic researchers, but I felt that a project like Gemini was more aligned with my current goals. n/n