David Romero
@Davidromogr
Followers
159
Following
17K
Media
29
Statuses
121
🇪🇨 Engineer - PhD Student @MBZUAI Computer Vision
Abu Dabi
Joined August 2017
Check out more results and our full paper: "Learning to Generate Object Interactions with Physics-Guided Video Diffusion" 🌐 https://t.co/cDxNlHuS1M 📜 https://t.co/cP2gkZxmpn Work Done with: Ariana Bermudez, @HaoLi81, Fabio Pizzati and Ivan Laptev !
arxiv.org
Recent models for video generation have achieved remarkable progress and are now deployed in film, social media production, and advertising. Beyond their creative potential, such models also hold...
0
1
3
• KineMask can also perform simple low-level motion 💨 control, moving different objects on different degrees of freedom and speeds.
1
0
0
• Emergence of Causality 🔥: KineMask trained on object interactions and tested with three different velocities on a real scene. As velocity increases, the resulting interactions also change, indicating that the model captures the causal structure of motion, a valuable property
1
0
0
• Unlike other works, KineMask can generate realistic interactions with other objects, showing a understanding of rigid body dynamics. • We also show complex interactions that require implicit 3D understanding, making a pot or a glass of juice fall and crash as a result of
1
0
0
• We train a controlnet using a low-level control signal, encoding the velocity as a mask🎭, and providing a high-level textual control extracted by a prediction of a VLM. •At inference, KineMask infer high-level outcomes of object motion from a single frame📷.
1
0
0
• We enable object-based control with a novel training strategy. Paired with synthetic data, KineMask 🎭enables pretrained diffusion models to synthesize realistic object interactions in real-world input scenes.
1
0
0
🚀 Introducing KineMask 🎭 , an approach for physics-guided video generation that enables realistic rigid body control and interactions. Given a Single Image and an Object Velocity ❗, KineMask generates videos with inferred motions and future object interactions !!!
1
0
2
I hope that CVQA will motivate and push the frontier towards more advanced and diverse multimodal models. Check out our work and the oral presentation at NeurIPS: Paper: https://t.co/DaZ1qjEzuv Presentation: https://t.co/5eOryQPTd0 Dataset:
huggingface.co
Language is a cornerstone of cultural identity. Our mission to preserve cultural heritage drives our innovative CVQA dataset, led by Ph.D. student David Romero and his team. The CVQA dataset challenges models with 10K+ questions across 31 languages and covers 10 diverse
0
0
9
Pranjal Chitale discusses the 2024 NeurIPS work CVQA. Spanning 31 languages & the cultures of 30 countries, this VQA benchmark was created with native speakers & cultural experts to evaluate model performance across diverse linguistic & cultural contexts. https://t.co/N1DF79Em71
0
8
22
The CVQA project is currently being presented by @Davidromogr at the East Meeting Room 1-3, do drop by if you’re around! Catch us as well at the poster sessions in the West Hall at 4:30 onwards :)
1
4
15
Proud of the depth and breadth of contributions from @Microsoft researchers at this year's @NeurIPSConf!
We’re excited to be a part of #NeurIPS2024! Explore the future of AI with over 100 groundbreaking papers, including oral and spotlight sessions, on reinforcement learning, advanced language model training, and multilingual, culturally inclusive benchmarks: https://t.co/QBDuYcDlT6
40
42
342
Our NeurIPS Oral paper : CVQA has been promoted by Microsoft in a podcast with Pranjal Chitale, check it out!! Paper: https://t.co/9hLkIZGfm1 Data: https://t.co/pUCo89EfSo
https://t.co/eJpqmLzVLy
microsoft.com
Pranjal Chitale discusses the 2024 NeurIPS work CVQA. Spanning 31 languages & the cultures of 30 countries, this VQA benchmark was created with native speakers & cultural experts to evaluate model...
0
2
8
Excited to attend #NeurIPS2024 to present CVQA (culturally relevant, human-written multilingual VQA) !! our recent work at @mbzuai. I'll be giving an oral session: 📍East Meeting Room 1-3 🗓️Thu 12 Dec 3:30 p.m. PST and the poster session: 📍West Ballroom A-D #5110 🗓️Thu 12 Dec
Happy to introduce our latest work , CVQA, a culturally-diverse multilingual Visual QA Benchmark, covering a rich set of languages and cultures from 28 countries across 4 continents. Work done @mbzuai w/ @Chenyang_Lyu, @haryoaw, @thamar_solorio, @AlhamFikri and 70+ contributors
1
5
15
David (@Davidromogr) the first author of CVQA (culturally relevant, human-written multilingual VQA) is presenting our work at the @mbzuai student research showcase Next week, he’ll present it in an oral session at @NeurIPSConf This is one of the works I’m most proud of!
🎉Happy to share our recent collaborative effort on building a culturally diverse, multilingual visual QA dataset! CVQA consists of over 9,000 questions across 28 countries, covering 26 languages (with more to be added!) 🌐 https://t.co/2JzSqhyMmp 📜 https://t.co/UbtRGvFvJ2
0
7
36