Zeeshan khan Profile
Zeeshan khan

@zeeshank95

Followers
68
Following
108
Media
8
Statuses
21

PhD @Inria Willow and @ENS_ULM

Paris, France
Joined May 2021
Don't wanna be here? Send us removal request.
@zeeshank95
Zeeshan khan
2 months
Can text-to-image Diffusion models handle surreal compositions beyond their training distribution?. 🚨 Introducing ComposeAnything — Composite object priors for diffusion models .📸 More faithful, controllable generations — no retraining required. 🔗1/9
Tweet media one
2
9
24
@zeeshank95
Zeeshan khan
2 months
Below 👇 are some examples of complex prompts, the LLM generated composite object priors, and the corresponding image generation. The work is done with @CordeliaSchmid and @chen_shizhe in the Willow team of @Inria Paris and @ENS_ULM . Arxiv:
Tweet media one
0
2
3
@zeeshank95
Zeeshan khan
2 months
We improve over prior inference only and layout-conditioned training methods, on T2I-compbench and NSR 1-k benchmarks, on both automatic metrics and human evaluations. Our method generates high-quality images with compositions that faithfully reflect the text. 8/9
Tweet media one
1
0
0
@zeeshank95
Zeeshan khan
2 months
We apply prior reinforcement💪and spatial control denoising 🎯 for the first few steps only, and then remove all the control. This allows generative flexibility 🎨and correction of object's positions, orientation, size, and general appearance. 7/9
Tweet media one
1
0
1
@zeeshank95
Zeeshan khan
2 months
To further enhance object-level spatial control in T2I generation, we propose a spatial-controlled attention mechanism that explicitly strengthens the alignment between specific image regions and their corresponding region textual descriptions. 🎯. 6/9
Tweet media one
1
0
1
@zeeshank95
Zeeshan khan
2 months
We propose prior reinforcement algorithm 💪that reinitializes the foreground prior at every step, while the background is denoised freely. This prevent excessive corruption of the foreground object prior, while coherently generating the background. 5/9
Tweet media one
1
0
1
@zeeshank95
Zeeshan khan
2 months
The composite object prior is fed to the forward pass of the Diffusion/Flow matching model, this gives us the structured noise. Which is initialised at timestep t~90% of noise. Generation begins with a noisy image prior using the proposed prior-guided diffusion. 4/9
Tweet media one
2
0
1
@zeeshank95
Zeeshan khan
2 months
Given a text prompt- LLMs with COT plans a 2.5D object layout, decomposing the prompt into detailed object level captions, 2D boxes, and Relative depth values. We generate one object at a time and arrange them using the 2.5D layout, resulting in a Composite Object Prior. 3/9
Tweet media one
1
0
1
@zeeshank95
Zeeshan khan
2 months
Generation starts from noise, but there's no notion of spatial structure or object intent. ⚠️ Result: models often ignore complex prompts or collapse into generic blobs. 💡Replace this with structured noise: by injecting noise into a coarse composition of the intended scene. 2/9.
1
0
1
@zeeshank95
Zeeshan khan
3 months
RT @MakarandTapaswi: 🔔New @CVPR paper evaluating compositional reasoning of Video-LLMs on 10s, action-packed clips!. 🥁 VELOCITI features 7….
0
11
0
@zeeshank95
Zeeshan khan
9 months
RT @gaur_manu: Can RL fine-tuning endow MLLMs with fine-grained visual understanding?. Using our training recipe, we outperform SOTA open-s….
0
30
0
@zeeshank95
Zeeshan khan
10 months
RT @gaur_manu: 🚨 Introducing Detect, Describe, Discriminate: Moving Beyond VQA for MLLM Evaluation. Given an image pair, it is easier for….
0
17
0
@zeeshank95
Zeeshan khan
1 year
RT @MakarandTapaswi: Thanks to the organizers (@davmoltisanti +) for an opportunity to share my thoughts at the amazing @CVPR Workshop "Wha….
0
12
0
@zeeshank95
Zeeshan khan
1 year
RT @phillip_isola: Our computer vision textbook is released!. Foundations of Computer Vision.with Antonio Torralba and Bill Freeman.https:/….
0
405
0
@zeeshank95
Zeeshan khan
1 year
RT @FuteralMatthieu: Announcing mOSCAR, multilingual interleaved text-image corpus as part of @oscarnlp project. Paper: .
0
26
0
@zeeshank95
Zeeshan khan
1 year
RT @MakarandTapaswi: Given multiple short movie clips, can models generate coherent identity-aware descriptions? 🤔 Turns out, this is a com….
0
14
0
@zeeshank95
Zeeshan khan
1 year
RT @FuteralMatthieu: Excited to introduce MAD Speech: a new set of metrics to measure acoustic diversity in speech. Work done @GoogleDeepM….
0
13
0
@zeeshank95
Zeeshan khan
1 year
RT @MakarandTapaswi: 📢Happy to announce two #CVPR2024 papers from our Katha AI group @iiit_hyderabad! 🎉🎞️🔥. 1. On 📺TV episode story summari….
0
9
0
@zeeshank95
Zeeshan khan
3 years
RT @MakarandTapaswi: Excited to receive Google's India Faculty Research Award 2022, my first Indian research grant 🙂 .
0
4
0
@zeeshank95
Zeeshan khan
3 years
RT @MakarandTapaswi: Given a short movie clip, can we identify who is doing what to/with whom, where, how & why? Our latest paper at @NeurI….
0
8
0