Zeeshan khan @zeeshank95 X Profile

Zeeshan khan

@zeeshank95

Followers

68

Following

108

Media

8

Statuses

21

PhD @Inria Willow and @ENS_ULM

Paris, France

Joined May 2021

Don't wanna be here? Send us removal request.

Zeeshan khan

@zeeshank95

2 months

Can text-to-image Diffusion models handle surreal compositions beyond their training distribution?. 🚨 Introducing ComposeAnything — Composite object priors for diffusion models .📸 More faithful, controllable generations — no retraining required. 🔗1/9

2

9

24

Zeeshan khan

@zeeshank95

2 months

Below 👇 are some examples of complex prompts, the LLM generated composite object priors, and the corresponding image generation. The work is done with @CordeliaSchmid and @chen_shizhe in the Willow team of @Inria Paris and @ENS_ULM . Arxiv:

0

2

3

Zeeshan khan

@zeeshank95

2 months

We improve over prior inference only and layout-conditioned training methods, on T2I-compbench and NSR 1-k benchmarks, on both automatic metrics and human evaluations. Our method generates high-quality images with compositions that faithfully reflect the text. 8/9

1

0

Zeeshan khan

@zeeshank95

2 months

We apply prior reinforcement💪and spatial control denoising 🎯 for the first few steps only, and then remove all the control. This allows generative flexibility 🎨and correction of object's positions, orientation, size, and general appearance. 7/9

1

0

1

Zeeshan khan

@zeeshank95

2 months

To further enhance object-level spatial control in T2I generation, we propose a spatial-controlled attention mechanism that explicitly strengthens the alignment between specific image regions and their corresponding region textual descriptions. 🎯. 6/9

1

0

1

Zeeshan khan

@zeeshank95

2 months

We propose prior reinforcement algorithm 💪that reinitializes the foreground prior at every step, while the background is denoised freely. This prevent excessive corruption of the foreground object prior, while coherently generating the background. 5/9

1

0

1

Zeeshan khan

@zeeshank95

2 months

The composite object prior is fed to the forward pass of the Diffusion/Flow matching model, this gives us the structured noise. Which is initialised at timestep t~90% of noise. Generation begins with a noisy image prior using the proposed prior-guided diffusion. 4/9

2

0

1

Zeeshan khan

@zeeshank95

2 months

Given a text prompt- LLMs with COT plans a 2.5D object layout, decomposing the prompt into detailed object level captions, 2D boxes, and Relative depth values. We generate one object at a time and arrange them using the 2.5D layout, resulting in a Composite Object Prior. 3/9

1

0

1

Zeeshan khan

@zeeshank95

2 months

Generation starts from noise, but there's no notion of spatial structure or object intent. ⚠️ Result: models often ignore complex prompts or collapse into generic blobs. 💡Replace this with structured noise: by injecting noise into a coarse composition of the intended scene. 2/9.

1

0

1

Zeeshan khan

@zeeshank95

3 months

RT @MakarandTapaswi: 🔔New @CVPR paper evaluating compositional reasoning of Video-LLMs on 10s, action-packed clips!. 🥁 VELOCITI features 7….

0

11

0

Zeeshan khan

@zeeshank95

9 months

RT @gaur_manu: Can RL fine-tuning endow MLLMs with fine-grained visual understanding?. Using our training recipe, we outperform SOTA open-s….

0

30

0

Zeeshan khan

@zeeshank95

10 months

RT @gaur_manu: 🚨 Introducing Detect, Describe, Discriminate: Moving Beyond VQA for MLLM Evaluation. Given an image pair, it is easier for….

0

17

0

Zeeshan khan

@zeeshank95

1 year

RT @MakarandTapaswi: Thanks to the organizers (@davmoltisanti +) for an opportunity to share my thoughts at the amazing @CVPR Workshop "Wha….

0

12

0

Zeeshan khan

@zeeshank95

1 year

RT @phillip_isola: Our computer vision textbook is released!. Foundations of Computer Vision.with Antonio Torralba and Bill Freeman.https:/….

0

405

0

Zeeshan khan

@zeeshank95

1 year

RT @FuteralMatthieu: Announcing mOSCAR, multilingual interleaved text-image corpus as part of @oscarnlp project. Paper: .

0

26

0

Zeeshan khan

@zeeshank95

1 year

RT @MakarandTapaswi: Given multiple short movie clips, can models generate coherent identity-aware descriptions? 🤔 Turns out, this is a com….

0

14

0

Zeeshan khan

@zeeshank95

1 year

RT @FuteralMatthieu: Excited to introduce MAD Speech: a new set of metrics to measure acoustic diversity in speech. Work done @GoogleDeepM….

0

13

0

Zeeshan khan

@zeeshank95

1 year

RT @MakarandTapaswi: 📢Happy to announce two #CVPR2024 papers from our Katha AI group @iiit_hyderabad! 🎉🎞️🔥. 1. On 📺TV episode story summari….

0

9

0

Zeeshan khan

@zeeshank95

3 years

RT @MakarandTapaswi: Excited to receive Google's India Faculty Research Award 2022, my first Indian research grant 🙂 .

0

4

0

Zeeshan khan

@zeeshank95

3 years

RT @MakarandTapaswi: Given a short movie clip, can we identify who is doing what to/with whom, where, how & why? Our latest paper at @NeurI….

0

8

0