Currently
@Google
and Ph.D. candidate at
@UW
, working on AI at the intersection of vision and language. Previously 4x Research Intern
@AmazonScience
.
Thank you
@_akhaliq
for sharing our work!
Check out Diffuse2Choose, a virtual try-all model that allows users to try on any e-commerce item in any setting!
Arxiv:
Website:
Thread:
Stay tuned for more!
Amazon presents Diffuse to Choose
Enriching Image Conditioned Inpainting in Latent Diffusion Models for Virtual Try-All
paper page:
Diffuse to Choose (DTC) allows users to virtually place any e-commerce item in any setting, ensuring detailed,
Introducing Quilt-LLaVA, a Large Language and Vision Assistant for
#Pathology
trained with spatially localized instruction tuning data generated from educational
#YouTube
videos, outperforming SOTA in various tasks.
🌎:
📜:
A 🧵:
Quilt-1M has been accepted for an oral presentation at
@NeurIPSConf
. As promised, we have also released our data and our model:
See you all in New Orleans!
Introducing Quilt-1M: One Million Image-Text Pairs for Histopathology. With Quilt-1M we propose a new source (video) and pipeline for collecting medical multimodal data!
📜:
💻:
🌎:
Demo:
Excited to share our latest work from my
@AmazonScience
internship: Diffuse2Choose, which is a zero-shot diffusion model that enables "Virtual Try-All" – the ability to virtually try on any e-commerce item!
🌎:
📜:
A 🧵:
Happy to share that we will be hosting The First Workshop on the Evaluation of Generative Foundational Models in
@CVPR
!
Are you an NLP or CV researcher with ideas on evaluating generative models? We seek innovative perspectives for talks and discussions, so please reach out!
Introducing Quilt-LLaVA, a Large Language and Vision Assistant for
#Pathology
trained with spatially localized instruction tuning data generated from educational
#YouTube
videos, outperforming SOTA in various tasks.
🌎:
📜:
A 🧵:
Call for papers on the First Workshop on Evaluation for Generative Foundation Models
@CVPR
. We're eager to hear your ideas on how generative models should be evaluated!
We also have a stellar speaker lineup:
@THassner
, Bo Li,
@RanjayKrishna
, Hanwang Zhang,
@leokarlin
,
After months of hard work, we are proud to introduce Quilt-1M, a large multimodal histopathology dataset curated from Youtube. We fully open-source our dataset and invite everyone to utilize this resource, so give it a try people!
Introducing Quilt-1M: One Million Image-Text Pairs for Histopathology. With Quilt-1M we propose a new source (video) and pipeline for collecting medical multimodal data!
📜:
💻:
🌎:
Demo:
@ak92501
@Meta
We show that a similar reconstruction-based objective works pretty well on CNN+Transformer in a multi-modal setting! We published our preprint 2 months ago using internal Amazon data, now the results are coming with the open-source data so stay tuned!
Thank you
@Scobleizer
, for covering our work!
Check out Diffuse2Choose, a virtual try-all model that allows users to try on any e-commerce item in any setting!
📜:
🌎:
Thread:
Stay tuned for more!
Europeans never miss any opportunity to be blatantly racist it seems. To my fellow Turkish students, you’ll never see this kind of low IQ garbage statements against students here in the US, so apply here instead.
Interesting story. Should universities in
#Sweden
or
#Finland
take action against students from
#Turkey
to put pressure on their government? Could it be a legitimate soft power tactic to try and force Erdogan's hand?
Excited to share our preprint from my last internship at Amazon: DreamPaint! A framework to inpaint any e-commerce product on any user-provided image.
1/5
Visual Instruction Tuning in pathology faces challenges due to existing PubMed-based datasets that lack 1) Spatial groundings for captions, affecting spatial awareness & 2) Holistic, whole slide image (WSI) level understanding, because of a focus on isolated, patch-level images.
In Iterative Abductive Reasoning, we facilitate conversation between two GPT agents. One, accessing just the patch-level caption, reasons toward a diagnosis. The other, informed by the diagnosis and its supporting facts, guides the first GPT toward the diagnosis as they converse.
Educational videos also enable holistic WSI understanding, which is crucial in pathology. Pathologists compound evidence toward diagnosis as they traverse WSIs in the videos, offering a rich context for diagnosis. The frames that they take time to explain distill the entire WSI.
We leverage educational videos where narrators use mouse pointers to show histopathological concepts. By capturing these pointer gestures, we effectively localize captions within the images, creating visually localized captions, and thereby generating spatially aware Q&A pairs.
@giffmana
@sainingxie
@endernewton
@inkynumbers
We show that a similar reconstruction-based objective works pretty well on CNN+Transformer in a multi-modal setting! We published our preprint 2 months ago using internal Amazon data, now the results are coming with the open-source data so stay tuned!
In Complex Reasoning, given a caption, alongside a diagnosis and supporting facts, we prompt GPT-4 in a diagnostic reasoning task designed to extrapolate beyond the immediate context of the given patch’s caption, while anchoring it to the facts extracted from the entire video.
Our code, model, and demo will be available soon, so please stay tuned! Extremely thankful to my amazing manager Karim Bouyarmane and the team at
@AmazonScience
for this opportunity to work on cutting-edge research in generative AI!
To evaluate Quilt-LLaVA, we not only used public VQA pathology datasets but also developed Quilt-VQA, which is derived from naturally occurring questions and answers in our videos, comprises 1283 Q&A pairs, curated using GPT-4 and custom algorithms for a comprehensive evaluation.
For training, we initialized Quilt-LLaVA with general-domain LLaVA, involving two stages: Histopathology Domain Alignment with the Quilt dataset and instruction-tuning on Quilt-Instruct. We utilize LLaMA-7b and QuiltNet as our text and image encoders, respectively.
Introducing two novel prompting techniques to generate instruction-tuning data from videos: Complex Reasoning and Iterative Abductive Reasoning, enabling GPT-4 to extrapolate diagnosis from captions and facilitate conversations between two GPT-4 agents for diagnostic insights.
This is a joint work with amazing folks at
@AmazonScience
: Maria Zontak, Bahar Erar Hood, Xu Zhang, Erran Li, Suren Kumar, and Karim Bouyarmane.
Stay tuned for the official announcement, which will include more details. Feel free to reach out anytime!
See you all in Seattle! 🎡
15000 personel 7000 enkazdan nasil insan kurtaracak? Bina basina 2 kisi ile nasil arama kurtarma yapilacak? Degil 15, 150 bin kisi bile yeterli degil. 450-500 bin kisinin seferber olmasi, ordunun erleriyle birlikte goreve dahil olmasi sart. Milletin akliyla alay mi ediyorsunuz?
We figured out why the "positional encoding" used in NeRF works so well! NTK theory says that using an MLP with a Fourier basis yields a composed kernel that is good at interpolating (as per basic signal processing). Excited to see what people do with this
By placing the reference product directly into the user image, we approximate its appearance. A secondary UNet encoder processes this collage, generating pixel-level product signals, which are then modulated to the main UNet decoder via affine transformations using a FILM layer.
@arankomatsuzaki
@MetaAI
We show that a similar reconstruction-based objective works pretty well on CNN+Transformer in a multi-modal setting! We published our preprint 2 months ago using internal Amazon data, now the results are coming with the open-source data so stay tuned!
@Mr_AllenT
Thank you for covering our work!
Please see the following thread for more information!
Also, please consider following me so you won't miss it when we share the codebase and the model 😅
Excited to share our latest work from my
@AmazonScience
internship: Diffuse2Choose, which is a zero-shot diffusion model that enables "Virtual Try-All" – the ability to virtually try on any e-commerce item!
🌎:
📜:
A 🧵:
Simply choose a product, take a picture of the desired location, Dreampaint then generates a realistic representation of the product in that location blending it with the surrounding environment. It's a hassle-free way to preview how products will look in any setting.
2/5
Dr. Sahin said he and Dr. Türeci learned about efficacy data on Sunday night and marked the moment by brewing Turkish tea at home. “We celebrated, of course,” he said. “It was a relief.
Meet Drs. Ugur Sahin and Özlem Türeci, the husband-and-wife team that founded the German company BioNTech, which has worked with Pfizer on a coronavirus vaccine found to be more than 90 percent effective.
Please come and see the QUILT oral today at 4 PM in Ballroom A-C, or find
@wizdom_dominic
,
@fghezloo
, and myself at our poster session in Hall B1+B2 at
#301
around 5 PM!
Introducing Quilt-1M: One Million Image-Text Pairs for Histopathology. With Quilt-1M we propose a new source (video) and pipeline for collecting medical multimodal data!
📜:
💻:
🌎:
Demo:
DreamPaint generates much better images compared to text-only or image-only guidance models as it preserves the fine-grained details of e-commerce items.
4/5
@alexcarliera
Thank you
@alexcarliera
, for covering our work! Please see the following thread for more information!
Also, please consider following so you won't miss it when we share the codebase and the model 😅
Excited to share our latest work from my
@AmazonScience
internship: Diffuse2Choose, which is a zero-shot diffusion model that enables "Virtual Try-All" – the ability to virtually try on any e-commerce item!
🌎:
📜:
A 🧵:
A neat trick of combining Masked Dreambooth and Inpainting modules allows us to learn e-commerce items as unique entities through a few-shot fine-tuning of our U-Net model.
3/5
Unlike recent virtual try-on literature, Diffuse2Choose can handle in-the-wild examples! Also, it is not limited to clothes but can also generate furniture, accessories, and more. Hence, it's a Virtual Try-All model!
Americans are losing their minds because they think $2 per hour is too low for a data annotation job in Kenya. It's slightly higher than the minimum wage in Turkey, which has 5x GDP per capita than Kenya. $2 per hour is probably middle-class wage there so it isn't a sweatshop.
@thibaudz
Thank you for covering our work!
Please see the following thread for more information!
Also, please consider following me so you won't miss it when we share the codebase and the model 😅
Excited to share our latest work from my
@AmazonScience
internship: Diffuse2Choose, which is a zero-shot diffusion model that enables "Virtual Try-All" – the ability to virtually try on any e-commerce item!
🌎:
📜:
A 🧵:
@natanielruizg
Awesome work
@natanielruizg
! We did use "almost" the exact same architecture for virtual try-on inpainting approach earlier
Maybe you guys have missed it. We'd appreciate if you guys could cite us in the camera-ready version of the paper :)
@gijigae
Thank you
@gijigae
, for covering our work!
Please see the following thread for more information!
Also, please consider following so you won't miss it when we share the codebase and the model 😅
Excited to share our latest work from my
@AmazonScience
internship: Diffuse2Choose, which is a zero-shot diffusion model that enables "Virtual Try-All" – the ability to virtually try on any e-commerce item!
🌎:
📜:
A 🧵:
We compare against both text-guided and image-guided inpainting modules and show that DreamPaint yields superior performance in both subjective human study and quantitative metrics.
5/5
FreeU: Free Lunch in Diffusion U-Net
paper page:
we uncover the untapped potential of diffusion U-Net, which serves as a "free lunch" that substantially improves the generation quality on the fly. We initially investigate the key contributions of the
@_akhaliq
How is this any different than DreamPaint?
Isn't it just masked Dreambooth for inpaint/outpainting? I guess it's cool not to cite preprints anymore.
@Scobleizer
Thank you for covering our work!
Please see the following thread for more information!
Also, please consider following so you won't miss it when we share the codebase and the model 😅
Excited to share our latest work from my
@AmazonScience
internship: Diffuse2Choose, which is a zero-shot diffusion model that enables "Virtual Try-All" – the ability to virtually try on any e-commerce item!
🌎:
📜:
A 🧵: