Grace Luo Profile
Grace Luo

@graceluo_

Followers
921
Following
518
Media
16
Statuses
34

phd student @berkeley_ai, vision + language

Joined June 2021
Don't wanna be here? Send us removal request.
@graceluo_
Grace Luo
1 month
✨New preprint: Dual-Process Image Generation! We distill *feedback from a VLM* into *feed-forward image generation*, at inference time. The result is flexible control: parameterize tasks as multimodal inputs, visually inspect the images with the VLM, and update the generator.🧵
21
176
1K
@graceluo_
Grace Luo
1 month
(6/n).Special thanks to my co-authors @jongranskog, @holynski_, @trevordarrell!. This was a great academic collaboration with @runwayml, especially @agermanidis, who was super supportive of our very experimental ideas throughout the entire process.
1
0
14
@graceluo_
Grace Luo
1 month
(5/n).Research is non-linear!. We started this project more than a year ago – first by training a diffusion captioner (a VLM that encodes images with diffusion hyperfeatures). We’re not working on that direction anymore, but here’s a peek at that first prototype:
1
0
22
@graceluo_
Grace Luo
1 month
(4/n).Check out our paper + code!. Our codebase lets you mix-and-match different off-the-shelf image generators and VLMs, and can run on Nvidia RTX 4090s. Page: Paper: Code:
1
2
33
@graceluo_
Grace Luo
1 month
(3/n).You can get pretty creative, because VLMs afford a flexible interface. We play around with implementing spatial controls via *visual prompting*, where we overlay the control over the image and ask the VLM if they match, then optimize for an image that matches the control.
Tweet media one
2
0
18
@graceluo_
Grace Luo
1 month
(2/n).Our method re-uses the visual instruction tuning loss originally used to train the VLM, to instead optimize the weights of the image generator.
Tweet media one
1
0
28
@graceluo_
Grace Luo
8 months
(7/n).Check out our paper and website for more info!. Website: Paper: Code:
0
0
4
@graceluo_
Grace Luo
8 months
(6/n).The idea of a task vector is not new; we study its cross-modal properties following prior work in LLMs (e.g., .
1
0
5
@graceluo_
Grace Luo
8 months
(5/n).The most surprising result, at least to me, is that task vectors can be patched from the base LLM to its corresponding fine-tuned VLM. Here we patch from Mistral to Idefics2. This means that the VLM can re-purpose functions learned entirely in language onto image queries.
Tweet media one
1
0
5
@graceluo_
Grace Luo
8 months
(4/n).Motivated by this similarity in task representations, we explore mixing and matching the task specification and query format. We call this cross-modal patching.
Tweet media one
1
0
2
@graceluo_
Grace Luo
8 months
(3/n).We start by looking at how the model processes the final token to execute the task. Conditioned on either text or image ICL, the token undergoes three phases across model layers: it first resembles the colon input, a meta-summary of the task, then the final answer.
Tweet media one
1
0
1
@graceluo_
Grace Luo
8 months
(2/n).Our main finding is that task representations in VLMs are consistent across modality (text, image) and specification (example, instruction).
Tweet media one
1
0
3
@graceluo_
Grace Luo
8 months
In a new preprint, we show that VLMs can perform cross-modal tasks. since text ICL 📚, instructions 📋, and image ICL 🖼️ are compressed into similar task representations. See “Task Vectors are Cross-Modal”, work w/ @trevordarrell, @_amirbar.
5
18
99
@graceluo_
Grace Luo
9 months
Our Knowledge in Generative Models workshop #ECCV2024 is happening in a few hours!. ⏰ Monday, Sept 30th, 2-6PM CEST .📍 Brown 2 (note location change from Brown 1).🔗
@anand_bhattad
Anand Bhattad
9 months
We are organizing a new workshop on "Knowledge in Generative Models" at #ECCV2024 to explore how generative models learn representations of the visual world and how we can use them for downstream applications. For the schedule and more details, visit our website: .🔗Website:
Tweet media one
1
0
10
@graceluo_
Grace Luo
9 months
RT @anand_bhattad: We are organizing a new workshop on "Knowledge in Generative Models" at #ECCV2024 to explore how generative models learn….
0
10
0
@graceluo_
Grace Luo
1 year
Come drop by our poster for🔮Readout Guidance at #CVPR2024 this Wednesday!. 📍Arch 4A-E, Poster #332.📅Wed 19 June 5-6:30PM PST.🌐🔗 w/ @trevordarrell, @oliver_wang2, @danbgoldman, @holynski_.
0
4
17
@graceluo_
Grace Luo
1 year
Update on 🔮Readout Guidance (!. We open sourced the code – check out our demos, model weights, and training code: Here’s a teaser of what you can do with our method:
4
50
242
@graceluo_
Grace Luo
2 years
Check out our poster for Diffusion Hyperfeatures ( at #NeurIPS2023 today! TL;DR: we distill descriptors from diffusion features. 📍Poster #607.⏰ Wed 13 Dec 5pm - 7pm CT.🌐 w/ @lisabdunlap, @dhpSeth, @holynski_, @trevordarrell
Tweet media one
0
5
57
@graceluo_
Grace Luo
2 years
(6/n).Finally, readout heads are accessible both to train and to use. We can train a readout head (5.9M params) for Stable Diffusion XL on as few as 100 samples for 3 hours on a single Nvidia A100 40GB GPU.
0
0
4
@graceluo_
Grace Luo
2 years
(5/n).We can also train readout heads that encode arbitrary spatial relationships. These heads can be used to drag and deform objects, including editing real images. For all results we use the same guidance framework without other techniques such as per-example finetuning.
1
0
3