Ferjad Naeem Profile
Ferjad Naeem

@ferjadnaeem

Followers
944
Following
1K
Media
11
Statuses
363

Research Scientist @Google

Zürich, Switzerland
Joined May 2010
Don't wanna be here? Send us removal request.
@ferjadnaeem
Ferjad Naeem
6 days
First Gemini release with a piece of my work inside 😄 and all the countless other amazing people 🚀🚀🚀
@GoogleDeepMind
Google DeepMind
6 days
Our first release is Gemini 3 Pro, which is rolling out globally starting today. It significantly outperforms 2.5 Pro across the board: 🥇 Tops LMArena and WebDev @arena leaderboards 🧠 PhD-level reasoning on Humanity’s Last Exam 📋 Leads long-horizon planning on Vending-Bench 2
1
0
6
@enisimsar
Enis Simsar
1 month
🚀 Excited to share our new work RefAM: Attention Magnets for Zero-Shot Referral Segmentation, a training-free approach that turns diffusion model attentions into segmentations. By @anna_kukleva_, me, Alessio Tonioni, @ferjadnaeem, @fedassa, @janericlenssen, Bernt Schiele
1
3
7
@prunetruong
Prune Truong
1 month
🎺Meet VIST3A — Text-to-3D by Stitching a Multi-view Reconstruction Network to a Video Generator. ➡️ Paper: https://t.co/sFqbbUiGOO ➡️ Website: https://t.co/QWMLwXyVcB Collaboration between ETH & Google with Hyojun Go, @DNarnhofer, Goutam Bhat, @fedassa, and Konrad Schindler.
@DNarnhofer
Dominik Narnhofer
1 month
Want to leverage the power of SOTA 3D models like VGGT & Video LDMs for 3D generation? Now you can! 🚀 Introducing VIST3A — we stitch pretrained video generators to 3D foundation models and align them via reward finetuning. 📄 https://t.co/MctMyuDev4 🌐
2
11
88
@ferjadnaeem
Ferjad Naeem
3 months
Oguzhan is an amazing mentor to work with, apply if you are on the internship market
@oguzhanthefatih
Oğuzhan Fatih Kar
3 months
🚨 Research Internship opportunity at Apple  We’re looking for interns to push the limits of multimodal AI agents! 📍 Santa Clara Valley 🇺🇸 & Zurich 🇨🇭 🗓️ Start: asap Send CV + representative work to mint-agent-internship@group.apple.com Also apply:
0
0
7
@ferjadnaeem
Ferjad Naeem
5 months
A big congratulations to the whole Gemini team on pushing this amazing family of models out 😄 Our tech report is out now: https://t.co/5FfTM1LEdN Feels a bit unreal to share the contributors list with all the amazing colleagues
@GoogleDeepMind
Google DeepMind
5 months
Hot Gemini updates off the press. 🚀 Anyone can now use 2.5 Flash and Pro to build and scale production-ready AI applications. 🙌 We’re also launching 2.5 Flash-Lite in preview: the fastest model in the 2.5 family to respond to requests, with the lowest cost too. 🧵
0
1
11
@jacklangerman
Jack (✈️ ICCV) Langerman
5 months
Active Data Curation Effectively Distills Large-Scale Multimodal Models - compute per sample loss with large batch - only backprop (probabilistically) through samples with high loss intuition: these are the samples where there is “something to learn” - if both teacher and
0
4
13
@ferjadnaeem
Ferjad Naeem
5 months
Stop by this amazing work from Vishaal and the team today at CVPR
@vishaal_urao
Vishaal Udandarao
5 months
Our ACID paper showing how you can use active data curation as an effective way to pretrain super-strong smol and efficient VL-encoders. Poster #361 in the Poster Hall from 10:30 AM - 12:30 PM on Saturday, 14th June https://t.co/LiKMgruXmP
0
1
9
@wightmanr
Ross Wightman
6 months
timm's got a new vision transformer (NaFlexVit), and it's flexible! I've been plugging away at this for a bit, integrating ideas from FlexiViT, NaViT, and NaFlex and finally ready to merge for initial exploration. The model supports: * variable aspect/size images of NaFlex (see
5
38
234
@sundarpichai
Sundar Pichai
6 months
At #GoogleIO, we shared how decades of AI research have now become reality.  From a total reimagining of Search to Agent Mode, Veo 3 and more, Gemini season will be the most exciting era of AI yet.  Some highlights 🧵
269
2K
14K
@mtschannen
Michael Tschannen
6 months
📢 We just released the code for JetFormer at https://t.co/Wgiz3tK9S8 Enjoy!
@mtschannen
Michael Tschannen
1 year
Have you ever wondered how to train an autoregressive generative transformer on text and raw pixels, without a pretrained visual tokenizer (e.g. VQ-VAE)? We have been pondering this during summer and developed a new model: JetFormer 🌊🤖 https://t.co/ngvPzZvUYW A thread 👇 1/
5
61
310
@mtschannen
Michael Tschannen
7 months
We are presenting JetFormer at ICLR this morning, poster #190. Stop by if you’re interested in unified multimodal architectures!
@mtschannen
Michael Tschannen
1 year
Have you ever wondered how to train an autoregressive generative transformer on text and raw pixels, without a pretrained visual tokenizer (e.g. VQ-VAE)? We have been pondering this during summer and developed a new model: JetFormer 🌊🤖 https://t.co/ngvPzZvUYW A thread 👇 1/
6
31
226
@andrefaraujo
André Araujo
8 months
Google's global PhD Fellowship program will open for applications this week! (on Apr 10th) This supports PhD students in computer science and related fields, also connecting to a Google mentor. Learn more and apply at: https://t.co/ynVQDf5xLi (deadline: May 15th, 2025)
Tweet card summary image
research.google
0
1
5
@ferjadnaeem
Ferjad Naeem
9 months
Check out the strongest open-source dense prediction models from our colleagues!
@kmaninis
Kevis-Kokitsi Maninis
9 months
📢📢 We released checkpoints and Pytorch/Jax code for TIPS: https://t.co/0JUIRML8gr Paper updated with distilled models, and more: https://t.co/zebYMD0VFz #ICLR2025
0
0
5
@rgilman33
Rudy Gilman
9 months
The majority of features in this layer of Siglip-2 are multimodal. I'd expected some multimodality but was surprised that two-thirds of the neurons I tested bind together their visual and linguistic features. This neuron fires for images of mustaches and for the word "mustache"
8
25
295
@ferjadnaeem
Ferjad Naeem
9 months
Fully supportive of this. Machine Learning/ Computer Vision review process is broken with irresponsible reviewers. Glad to see there is some accountability.
@CVPR
#CVPR2026
9 months
#CVPR2025 Area Chairs (ACs) identified a number of highly irresponsible reviewers, those who either abandoned the review process entirely or submitted egregiously low-quality reviews, including some generated by large language models (LLMs). 1/2
0
0
5
@ferjadnaeem
Ferjad Naeem
9 months
Delighted to share that ACED has been accepted at CVPR2025! Check out our work to know how to distill the strongest smol size image-text contrastive models.
@ferjadnaeem
Ferjad Naeem
1 year
Check out our latest work that explores data curation as a paradigm to learn compute efficient image text contrastive models Had a blast collaborating across Google, Deepmind, Tuebingen and Cambridge on this work
0
0
16
@mtschannen
Michael Tschannen
9 months
📢2⃣ Yesterday we released SigLIP 2! TL;DR: Improved high-level semantics, localization, dense features, and multilingual capabilities via drop-in replacement for v1. Bonus: Variants supporting native aspect and variable sequence length. A thread with interesting resources👇
5
34
171
@ferjadnaeem
Ferjad Naeem
9 months
Excited to share what we have been up to in image text embedding models. SigLIP 2 is the most powerful encoder for most open vocabulary computer vision and MMLLM tasks. Checkpoints are open sourced and we look forward to what the community achieves with these.
@XiaohuaZhai
Xiaohua Zhai
9 months
Introducing SigLIP2: now trained with additional captioning and self-supervised losses! Stronger everywhere: - multilingual - cls. / ret. - localization - ocr - captioning / vqa Try it out, backward compatible! Models: https://t.co/3hOdqcy9QD Paper: https://t.co/Tp4D8Syld8
2
4
48
@haiyang73756134
Haiyang Wang
9 months
Excited to see our paper "Tokenformer: Rethinking transformer scaling with tokenized model parameters" accepted as a spotlight at #ICLR2025 ! Hope our idea of ​​tokenizing everything can inspire the future of AI. Paper: https://t.co/0ofQpsudSH Code: https://t.co/D3rhZzqMwD
3
45
270
@XiaohuaZhai
Xiaohua Zhai
9 months
Ever thought of training multimodal models with 100 billion 🚀 unique examples? Check out WebLI-100B! The study reveals exciting insights on long-tail tasks, including multilingual and cultural diversity benchmarks. Paper: https://t.co/npNrvPGY53
4
18
159