Emanuele Bugliarello
@ebugliarello
Followers
1K
Following
1K
Media
62
Statuses
218
Multimodal researcher @GoogleDeepMind
Grenoble, France
Joined August 2019
Dynamic CFG: adaptive guidance for diffusion models • Static CFG = “one-size-fits-all” fails across prompts • New method: online feedback from latent evaluators (CLIP, fidelity, prefs) → dynamic per-step CFG • Just +1% overhead, big gains in alignment, quality & text
1
30
146
Preprint is out: we solve the CFG conundrum! Simple or out of distribution prompts benefit from unconditional generation, but challenging ones require to dial up the guidance strength. But no need to rely on empirical observations. We introduce dynamic CFG via online feedback👇
Dynamic CFG: adaptive guidance for diffusion models • Static CFG = “one-size-fits-all” fails across prompts • New method: online feedback from latent evaluators (CLIP, fidelity, prefs) → dynamic per-step CFG • Just +1% overhead, big gains in alignment, quality & text
2
6
29
TL;DR: A new benchmark (WYD) 🤹 Larger & more diverse than existing ones 🔎 With fine-grained & meticulous annotations by yours truly & metrics 🎯 For video-level and human-targeted evaluations 🫂 That correlate well with human preferences ⇒ many new measurable challenges! 🎳
0
0
1
Frustrated with trying to animate characters with video generation models? And end up muttering "What are you doing?" We too. So, we made a new benchmark (WYD) to push controllable human generation for real-world settings! 📄 https://t.co/lPTJGB03bp 🧑💻 https://t.co/btS73xzdBZ
1
1
8
📢#ACL2025NLP This year we received 8276 submissions 👏 which is the highest number in the history of ACL conferences 🙌 If you are not yet involved as a reviewer, AC or SAC, we would encourage you to volunteer as an (emergency) AC or reviewer https://t.co/UhPTpK7hq6 🙏
docs.google.com
Use this form to volunteer to join the ACL 2025 program committee as an (emergency) reviewer or area chair (AC). The reviewers need to be available in March and early April 2025. ACs need to be...
6
42
155
📢 Have you been wondering what workshops are brewing in the *ACL venues in 2025? The list that we've been waiting for in here. Feel free to tag or repost with the organisers. Below are ACL 2025 workshops: #ACL2025NLP #NLProc #workshop 🧵
2
22
67
🚀🚀PaliGemma 2 is our updated and improved PaliGemma release using the Gemma 2 models and providing new pre-trained checkpoints for the full cross product of {224px,448px,896px} resolutions and {3B,10B,28B} model sizes. 1/7
5
54
260
Want to work on the future of multimodal AI? Our Google DeepMind team in Grenoble, led by @CordeliaSchmid, is hiring interns for multimodal AI research (long-video understanding and visual reasoning in 2D and 3D). Email ai.gnb.hiring@gmail.com or find me at #NeurIPS2024!
5
17
184
📢#ACL2025 is inviting nominations and self-nominations to the ACL 2025 programme committee (reviewers or area chair) ➡️ https://t.co/YWQikbGZIv deadline for nominations 🗓️ 16 Dec 2024. 🙏
docs.google.com
Use this form to express your interest in joining the ACL 2025 programme committee as a reviewer or area chair (AC). The review period is 1st to 20th of March 2025. ACs need to be available for...
0
31
88
Our team at Google DeepMind is seeking a Research Scientist with a strong publication record (multiple first-author papers) on multi-modal LLMs in top ML venues like NeurIPS, ICLR, CVPR. Email me at af_hiring@google.com @CordeliaSchmid
4
49
383
Embrace cultural diversity in your large-scale data! 🌎🌍🌏 @angelinepouget’s study shows that (quantitatively) you have no reason not to 🌸
PSA: Stop pretraining your VLMs on EN-filtered data, even if it improves ImageNet and COCO‼️ Doing so impairs the model's understanding of non-English cultures❗️ I argued for years, now finally publish concrete results for this (imo) intuitively obvious recommendation A🧾🧶
1
1
7
PSA: Stop pretraining your VLMs on EN-filtered data, even if it improves ImageNet and COCO‼️ Doing so impairs the model's understanding of non-English cultures❗️ I argued for years, now finally publish concrete results for this (imo) intuitively obvious recommendation A🧾🧶
Want your VLM to reflect the world's rich diversity 🌍? We’re very excited to share our recent research on this topic. TLDR: to build truly inclusive models that work for everyone, don’t filter by English, and check out our recommended evaluation benchmarks. (1/7)
10
32
281
@GoogleDeepMind @OliviaW47557022 @ChuhanZhang5 @isabela_alb @wangsu_gdm @yasumasa_onoe @CyrusRashtchian @jponttuset @aidanematzadeh Overall: ☑️ Fine-grained human rating templates are more consistent with each other 🧑⚖️ Reliable prompts and fine-grained templates lead to consistent model ordering 🔝 To compare auto-eval metrics, reliable prompts better measure alignment Check out our paper for more details!
0
0
3
@GoogleDeepMind @OliviaW47557022 @ChuhanZhang5 @isabela_alb @wangsu_gdm @yasumasa_onoe @CyrusRashtchian @jponttuset @aidanematzadeh We also propose Gecko, a new VQA+LLM metric that improves upon prior work by: 🔍 better coverage of visual words in QAs 🪣 filtering hallucinated QAs 🤷 accounting for the uncertainty in the VQA scores Gecko obtains best correlation across human templates on Gecko2K and TIFA160
1
0
3
@GoogleDeepMind @OliviaW47557022 @ChuhanZhang5 @isabela_alb @wangsu_gdm @yasumasa_onoe @CyrusRashtchian @jponttuset @aidanematzadeh This leads to 100K+ ratings, which allow us to: ⚖️ compare different templates → finer-grained templates (WL and DSG(H)) have better inter-annotator agreement ✨ define a subset of *reliable* prompts, Gecko2K-rel, where annotators agree across models and template
1
0
3
@GoogleDeepMind @OliviaW47557022 @ChuhanZhang5 @isabela_alb @wangsu_gdm @yasumasa_onoe @CyrusRashtchian @jponttuset @aidanematzadeh We curate a new set of prompts, Gecko2K, that allows us to assess a variety of skills (eg. counting, spatial, texture) in T2I models We then generate images with 4 T2I models, and run human evaluation with 4 different rating templates (eg. side-by-side comparison, or Likert)
1
0
4
Check out Gecko 🦎: @GoogleDeepMind's latest work looking at how to evaluate text-to-image technology with: 📊 a new benchmark 🕵️ 100K+ human ratings of state-of-the-art T2I models 🤖 a better human-correlated auto-eval metric https://t.co/CyB4YwgYjh
5
22
99
PaliGemma - Open Vision Model from Google! 💎 > 3B parameter model - SigLiP + Gemma 2B > Supports images upto 896 x 896 resolution > Capable of Document understanding, Image detection, visual question answering, captioning and more > In addition to general purpose checkpoints
6
68
315