Ferjad Naeem @ferjadnaeem X Profile

Ferjad Naeem

@ferjadnaeem

Followers

944

Following

1K

Media

11

Statuses

363

Research Scientist @Google

https://t.co/AhM7Z8SGm1

Zürich, Switzerland

Joined May 2010

Don't wanna be here? Send us removal request.

Ferjad Naeem

@ferjadnaeem

6 days

First Gemini release with a piece of my work inside 😄 and all the countless other amazing people 🚀🚀🚀

Google DeepMind

@GoogleDeepMind

6 days

Our first release is Gemini 3 Pro, which is rolling out globally starting today. It significantly outperforms 2.5 Pro across the board: 🥇 Tops LMArena and WebDev @arena leaderboards 🧠 PhD-level reasoning on Humanity’s Last Exam 📋 Leads long-horizon planning on Vending-Bench 2

1

0

6

Enis Simsar

@enisimsar

1 month

🚀 Excited to share our new work RefAM: Attention Magnets for Zero-Shot Referral Segmentation, a training-free approach that turns diffusion model attentions into segmentations. By @anna_kukleva_, me, Alessio Tonioni, @ferjadnaeem, @fedassa, @janericlenssen, Bernt Schiele

1

3

7

Prune Truong

@prunetruong

1 month

🎺Meet VIST3A — Text-to-3D by Stitching a Multi-view Reconstruction Network to a Video Generator. ➡️ Paper: https://t.co/sFqbbUiGOO ➡️ Website: https://t.co/QWMLwXyVcB Collaboration between ETH & Google with Hyojun Go, @DNarnhofer, Goutam Bhat, @fedassa, and Konrad Schindler.

Dominik Narnhofer

@DNarnhofer

1 month

Want to leverage the power of SOTA 3D models like VGGT & Video LDMs for 3D generation? Now you can! 🚀 Introducing VIST3A — we stitch pretrained video generators to 3D foundation models and align them via reward finetuning. 📄 https://t.co/MctMyuDev4 🌐

2

11

88

Ferjad Naeem

@ferjadnaeem

3 months

Oguzhan is an amazing mentor to work with, apply if you are on the internship market

Oğuzhan Fatih Kar

@oguzhanthefatih

3 months

🚨 Research Internship opportunity at Apple  We’re looking for interns to push the limits of multimodal AI agents! 📍 Santa Clara Valley 🇺🇸 & Zurich 🇨🇭 🗓️ Start: asap Send CV + representative work to mint-agent-internship@group.apple.com Also apply:

0

7

Ferjad Naeem

@ferjadnaeem

5 months

A big congratulations to the whole Gemini team on pushing this amazing family of models out 😄 Our tech report is out now: https://t.co/5FfTM1LEdN Feels a bit unreal to share the contributors list with all the amazing colleagues

Google DeepMind

@GoogleDeepMind

5 months

Hot Gemini updates off the press. 🚀 Anyone can now use 2.5 Flash and Pro to build and scale production-ready AI applications. 🙌 We’re also launching 2.5 Flash-Lite in preview: the fastest model in the 2.5 family to respond to requests, with the lowest cost too. 🧵

0

1

11

Jack (✈️ ICCV) Langerman

@jacklangerman

5 months

Active Data Curation Effectively Distills Large-Scale Multimodal Models - compute per sample loss with large batch - only backprop (probabilistically) through samples with high loss intuition: these are the samples where there is “something to learn” - if both teacher and

0

4

13

Ferjad Naeem

@ferjadnaeem

5 months

Stop by this amazing work from Vishaal and the team today at CVPR

Vishaal Udandarao

@vishaal_urao

5 months

Our ACID paper showing how you can use active data curation as an effective way to pretrain super-strong smol and efficient VL-encoders. Poster #361 in the Poster Hall from 10:30 AM - 12:30 PM on Saturday, 14th June https://t.co/LiKMgruXmP

0

1

9

Ross Wightman

@wightmanr

6 months

timm's got a new vision transformer (NaFlexVit), and it's flexible! I've been plugging away at this for a bit, integrating ideas from FlexiViT, NaViT, and NaFlex and finally ready to merge for initial exploration. The model supports: * variable aspect/size images of NaFlex (see

5

38

234

Sundar Pichai

@sundarpichai

6 months

At #GoogleIO, we shared how decades of AI research have now become reality. From a total reimagining of Search to Agent Mode, Veo 3 and more, Gemini season will be the most exciting era of AI yet. Some highlights 🧵

269

2K

14K

Michael Tschannen

@mtschannen

6 months

📢 We just released the code for JetFormer at https://t.co/Wgiz3tK9S8 Enjoy!

Michael Tschannen

@mtschannen

1 year

Have you ever wondered how to train an autoregressive generative transformer on text and raw pixels, without a pretrained visual tokenizer (e.g. VQ-VAE)? We have been pondering this during summer and developed a new model: JetFormer 🌊🤖 https://t.co/ngvPzZvUYW A thread 👇 1/

5

61

310

Michael Tschannen

@mtschannen

7 months

We are presenting JetFormer at ICLR this morning, poster #190. Stop by if you’re interested in unified multimodal architectures!

Michael Tschannen

@mtschannen

1 year

Have you ever wondered how to train an autoregressive generative transformer on text and raw pixels, without a pretrained visual tokenizer (e.g. VQ-VAE)? We have been pondering this during summer and developed a new model: JetFormer 🌊🤖 https://t.co/ngvPzZvUYW A thread 👇 1/

6

31

226

André Araujo

@andrefaraujo

8 months

Google's global PhD Fellowship program will open for applications this week! (on Apr 10th) This supports PhD students in computer science and related fields, also connecting to a Google mentor. Learn more and apply at: https://t.co/ynVQDf5xLi (deadline: May 15th, 2025)

research.google

0

1

5

Ferjad Naeem

@ferjadnaeem

9 months

Check out the strongest open-source dense prediction models from our colleagues!

Kevis-Kokitsi Maninis

@kmaninis

9 months

📢📢 We released checkpoints and Pytorch/Jax code for TIPS: https://t.co/0JUIRML8gr Paper updated with distilled models, and more: https://t.co/zebYMD0VFz #ICLR2025

0

5

Rudy Gilman

@rgilman33

9 months

The majority of features in this layer of Siglip-2 are multimodal. I'd expected some multimodality but was surprised that two-thirds of the neurons I tested bind together their visual and linguistic features. This neuron fires for images of mustaches and for the word "mustache"

8

25

295

Ferjad Naeem

@ferjadnaeem

9 months

Fully supportive of this. Machine Learning/ Computer Vision review process is broken with irresponsible reviewers. Glad to see there is some accountability.

#CVPR2026

@CVPR

9 months

#CVPR2025 Area Chairs (ACs) identified a number of highly irresponsible reviewers, those who either abandoned the review process entirely or submitted egregiously low-quality reviews, including some generated by large language models (LLMs). 1/2

0

5

Ferjad Naeem

@ferjadnaeem

9 months

Delighted to share that ACED has been accepted at CVPR2025! Check out our work to know how to distill the strongest smol size image-text contrastive models.

Ferjad Naeem

@ferjadnaeem

1 year

Check out our latest work that explores data curation as a paradigm to learn compute efficient image text contrastive models Had a blast collaborating across Google, Deepmind, Tuebingen and Cambridge on this work

0

16

Michael Tschannen

@mtschannen

9 months

📢2⃣ Yesterday we released SigLIP 2! TL;DR: Improved high-level semantics, localization, dense features, and multilingual capabilities via drop-in replacement for v1. Bonus: Variants supporting native aspect and variable sequence length. A thread with interesting resources👇

5

34

171

Ferjad Naeem

@ferjadnaeem

9 months

Excited to share what we have been up to in image text embedding models. SigLIP 2 is the most powerful encoder for most open vocabulary computer vision and MMLLM tasks. Checkpoints are open sourced and we look forward to what the community achieves with these.

Xiaohua Zhai

@XiaohuaZhai

9 months

Introducing SigLIP2: now trained with additional captioning and self-supervised losses! Stronger everywhere: - multilingual - cls. / ret. - localization - ocr - captioning / vqa Try it out, backward compatible! Models: https://t.co/3hOdqcy9QD Paper: https://t.co/Tp4D8Syld8

2

4

48

Haiyang Wang

@haiyang73756134

9 months

Excited to see our paper "Tokenformer: Rethinking transformer scaling with tokenized model parameters" accepted as a spotlight at #ICLR2025 ! Hope our idea of tokenizing everything can inspire the future of AI. Paper: https://t.co/0ofQpsudSH Code: https://t.co/D3rhZzqMwD

3

45

270

Xiaohua Zhai

@XiaohuaZhai

9 months

Ever thought of training multimodal models with 100 billion 🚀 unique examples? Check out WebLI-100B! The study reveals exciting insights on long-tail tasks, including multilingual and cultural diversity benchmarks. Paper: https://t.co/npNrvPGY53

4

18

159