Grace Luo @graceluo_ X Profile

Grace Luo

@graceluo_

Followers

921

Following

518

Media

16

Statuses

34

phd student @berkeley_ai, vision + language

Joined June 2021

Don't wanna be here? Send us removal request.

Grace Luo

@graceluo_

1 month

✨New preprint: Dual-Process Image Generation! We distill *feedback from a VLM* into *feed-forward image generation*, at inference time. The result is flexible control: parameterize tasks as multimodal inputs, visually inspect the images with the VLM, and update the generator.🧵

21

176

1K

Grace Luo

@graceluo_

1 month

(6/n).Special thanks to my co-authors @jongranskog, @holynski_, @trevordarrell!. This was a great academic collaboration with @runwayml, especially @agermanidis, who was super supportive of our very experimental ideas throughout the entire process.

1

0

14

Grace Luo

@graceluo_

1 month

(5/n).Research is non-linear!. We started this project more than a year ago – first by training a diffusion captioner (a VLM that encodes images with diffusion hyperfeatures). We’re not working on that direction anymore, but here’s a peek at that first prototype:

1

0

22

Grace Luo

@graceluo_

1 month

(4/n).Check out our paper + code!. Our codebase lets you mix-and-match different off-the-shelf image generators and VLMs, and can run on Nvidia RTX 4090s. Page: Paper: Code:

1

2

33

Grace Luo

@graceluo_

1 month

(3/n).You can get pretty creative, because VLMs afford a flexible interface. We play around with implementing spatial controls via *visual prompting*, where we overlay the control over the image and ask the VLM if they match, then optimize for an image that matches the control.

2

0

18

Grace Luo

@graceluo_

1 month

(2/n).Our method re-uses the visual instruction tuning loss originally used to train the VLM, to instead optimize the weights of the image generator.

1

0

28

Grace Luo

@graceluo_

8 months

(7/n).Check out our paper and website for more info!. Website: Paper: Code:

0

4

Grace Luo

@graceluo_

8 months

(6/n).The idea of a task vector is not new; we study its cross-modal properties following prior work in LLMs (e.g., .

1

0

5

Grace Luo

@graceluo_

8 months

(5/n).The most surprising result, at least to me, is that task vectors can be patched from the base LLM to its corresponding fine-tuned VLM. Here we patch from Mistral to Idefics2. This means that the VLM can re-purpose functions learned entirely in language onto image queries.

1

0

5

Grace Luo

@graceluo_

8 months

(4/n).Motivated by this similarity in task representations, we explore mixing and matching the task specification and query format. We call this cross-modal patching.

1

0

2

Grace Luo

@graceluo_

8 months

(3/n).We start by looking at how the model processes the final token to execute the task. Conditioned on either text or image ICL, the token undergoes three phases across model layers: it first resembles the colon input, a meta-summary of the task, then the final answer.

1

0

1

Grace Luo

@graceluo_

8 months

(2/n).Our main finding is that task representations in VLMs are consistent across modality (text, image) and specification (example, instruction).

1

0

3

Grace Luo

@graceluo_

8 months

In a new preprint, we show that VLMs can perform cross-modal tasks. since text ICL 📚, instructions 📋, and image ICL 🖼️ are compressed into similar task representations. See “Task Vectors are Cross-Modal”, work w/ @trevordarrell, @_amirbar.

5

18

99

Grace Luo

@graceluo_

9 months

Our Knowledge in Generative Models workshop #ECCV2024 is happening in a few hours!. ⏰ Monday, Sept 30th, 2-6PM CEST .📍 Brown 2 (note location change from Brown 1).🔗

Anand Bhattad

@anand_bhattad

9 months

We are organizing a new workshop on "Knowledge in Generative Models" at #ECCV2024 to explore how generative models learn representations of the visual world and how we can use them for downstream applications. For the schedule and more details, visit our website: .🔗Website:

1

0

10

Grace Luo

@graceluo_

9 months

RT @anand_bhattad: We are organizing a new workshop on "Knowledge in Generative Models" at #ECCV2024 to explore how generative models learn….

0

10

0

Grace Luo

@graceluo_

1 year

Come drop by our poster for🔮Readout Guidance at #CVPR2024 this Wednesday!. 📍Arch 4A-E, Poster #332.📅Wed 19 June 5-6:30PM PST.🌐🔗 w/ @trevordarrell, @oliver_wang2, @danbgoldman, @holynski_.

0

4

17

Grace Luo

@graceluo_

1 year

Update on 🔮Readout Guidance (!. We open sourced the code – check out our demos, model weights, and training code: Here’s a teaser of what you can do with our method:

4

50

242

Grace Luo

@graceluo_

2 years

Check out our poster for Diffusion Hyperfeatures ( at #NeurIPS2023 today! TL;DR: we distill descriptors from diffusion features. 📍Poster #607.⏰ Wed 13 Dec 5pm - 7pm CT.🌐 w/ @lisabdunlap, @dhpSeth, @holynski_, @trevordarrell

0

5

57

Grace Luo

@graceluo_

2 years

(6/n).Finally, readout heads are accessible both to train and to use. We can train a readout head (5.9M params) for Stable Diffusion XL on as few as 100 samples for 3 hours on a single Nvidia A100 40GB GPU.

0

4

Grace Luo

@graceluo_

2 years

(5/n).We can also train readout heads that encode arbitrary spatial relationships. These heads can be used to drag and deform objects, including editing real images. For all results we use the same guidance framework without other techniques such as per-example finetuning.

1

0

3