Yutong (Kelly) He @electronickale X Profile

Yutong (Kelly) He

@electronickale

Followers

928

Following

1K

Media

17

Statuses

117

PhD student @mldcmu, I’m so delusional that doing generative modeling is my job

https://t.co/VRFYJKnhph

Pittsburgh, PA

Joined March 2021

Don't wanna be here? Send us removal request.

Yutong (Kelly) He

@electronickale

6 months

✨ Love 4o-style image generation but prefer to use Midjourney? Tired of manual prompt crafting from inspo images? PRISM to the rescue! 🖼️→📝→🖼️ We automate black-box prompt engineering—no training, no embeddings, just accurate, readable prompts from your inspo images! 1/🧵

3

31

84

Peter Holderrieth

@peholderrieth

18 days

New work: “GLASS Flows: Transition Sampling for Alignment of Flow and Diffusion Models”. GLASS generates images by sampling stochastic Markov transitions with ODEs - allowing us to boost text-image alignment for large-scale models at inference time! https://t.co/unsuG3mYer [1/7]

3

58

241

Ricky T. Q. Chen

@RickyTQChen

20 days

OneFlow is a native multimodal model using just insertions. Scales better than Transfusion across a range of benchmarks up to 8B. New capabilities: naturally hierarchical generation, CFG for text detailedness, concurrent image+text interleaved, & more. https://t.co/LeDEiyRnQd

oneflow.framer.ai

OneFlow, the first non-autoregressive multimodal model that enables variable-length and concurrent mixed-modal generation.

John Nguyen

@__JohnNguyen__

20 days

Transfusion combines autoregressive with diffusion to train a single transformer, but what if we combine Flow with Flow? 🤔 🌊OneFlow🌊 the first non-autoregressive model to generate text and images concurrently using a single transformer—unifying Edit Flow (text) with Flow

2

19

150

Nicholas Boffi

@nmboffi

20 days

Consistency models, CTMs, shortcut models, align your flow, mean flow... What's the connection, and how should you learn them in practice? We show they're all different sides of the same coin connected by one central object: the flow map. https://t.co/QBp1kELVhF 🧵(1/n)

5

69

341

Dylan Sam

@dylanjsam

1 month

🚨Excited to introduce a major development in building safer language models: Safety Pretraining! Instead of post-hoc alignment, we take a step back and embed safety directly into pretraining. 🧵(1/n)

7

90

342

Yuda Song

@yus167

2 months

LLMs lose diversity after RL post-training, and this hurts test-time scaling & creativity. Why does this collapse happen, and how can we fix it? Our new work introduces: 🔍 RL as Sampling (analysis) 🗺️ Outcome-based Exploration (intervention) [1/n]

9

87

469

Heli Ben-Hamu

@helibenhamu

2 months

Excited to share our work Set Block Decoding! A new paradigm combining next-token-prediction and masked (or discrete diffusion) models, allowing parallel decoding without any architectural changes and with exact KV cache. Arguably one of the simplest ways to accelerate LLMs!

3

24

115

Yutong (Kelly) He

@electronickale

2 months

Holy shi this is legit one of most creative papers I’ve read in a while

Rohan Paul

@rohanpaul_ai

2 months

🎨 Super impressive. Researchers built an optical diffusion model that uses light, not heavy computer math, to generate images with quality similar to standard AI diffusion but at near-zero power. This work shows a light powered image generator that produces images with almost

0

1

Yutong (Kelly) He

@electronickale

3 months

At this point, I propose to cancel rebuttal altogether, you only get one shot, don’t miss your chance to blow, opportunity comes once in a lifetime, bro

6

1

40

Yutong (Kelly) He

@electronickale

3 months

My rebuttal a year ago: Thank you so much for your thoughtful feedbacks and we would like to address your concerns below… My rebuttal last time: Here is our results. And I won’t thank you because I wont have space. My rebuttal this time: Trust me bro

2

1

49

Yutong (Kelly) He

@electronickale

4 months

🔥🔥🔥

Sukjun (June) Hwang

@sukjun_hwang

4 months

Tokenization has been the final barrier to truly end-to-end language models. We developed the H-Net: a hierarchical network that replaces tokenization with a dynamic chunking process directly inside the model, automatically discovering and operating over meaningful units of data

0

6

Ricky T. Q. Chen

@RickyTQChen

5 months

Padding in our non-AR sequence models? Yuck. 🙅 👉 Instead of unmasking, our new work *Edit Flows* perform iterative refinements via position-relative inserts and deletes, operations naturally suited for variable-length sequence generation. Easily better than using mask tokens.

8

80

518

Yutong (Kelly) He

@electronickale

5 months

Congrats Avi! 🎉🎉🎉

Avi Schwarzschild

@A_v_i__S

5 months

Big news! 🎉 I’m joining UNC-Chapel Hill as an Assistant Professor in Computer Science starting next year! Before that, I’ll be spending time @OpenAI working on LLM privacy. @unccs @uncnlp

0

6

Fahim Tajwar

@FahimTajwar10

5 months

RL with verifiable reward has shown impressive results in improving LLM reasoning, but what can we do when we do not have ground truth answers? Introducing Self-Rewarding Training (SRT): where language models provide their own reward for RL training! 🧵 1/n

21

146

843

Jian Ma

@jmuiuc

6 months

LLOKI (a variant of Loki):

Jian Ma

@jmuiuc

7 months

Integrating spatial transcriptomics across platforms is hard - different gene panels, sparse data. We introduce LLOKI, using optimal transport + single-cell FMs for unified ST integration. Work led by @ellie_haber (@mldcmu) & Lane Fellow @SpencerKrieger

1

8

Yutong (Kelly) He

@electronickale

6 months

When the ddl is approaching and you are violently editing something you wrote a while ago

0

17

Yutong (Kelly) He

@electronickale

6 months

Check out our paper for all the details! 🔗 Project page: https://t.co/ZWzBaF0QVM 📄 arXiv: https://t.co/gShCTkrWia 💻 Code: https://t.co/5sU0tGgkm1 w/ @AlexRobey23 @smiurtitkii @yidingjiang @jnwilliams_cmu @pappasg69 @HamedSHassani @mittu1204 @rsalakhu @zicokolter 8/8

github.com

The official implementation of PRISM: Automated Black-box Prompt Engineering for Personalized Text-to-Image Generation - KellyYutongHe/prism_demo

1

6

Yutong (Kelly) He

@electronickale

6 months

🧩 Multi-concept generation becomes intuitive too! PRISM lets you easily identify and combine different components from reference images into coherent scenes. 7/🧵

1

0

2

Yutong (Kelly) He

@electronickale

6 months

✏️ Want to tweak your generated image? You can simply edit PRISM-generated prompts directly! Change outfits, poses or backgrounds, just modify the part of the prompt you want, no more mysterious embeddings or gibberish prompts! 6/🧵

1

0

4

Yutong (Kelly) He

@electronickale

6 months

📊 In our experiments, PRISM outperforms/matches baselines across popular text-to-image models (Stable Diffusion, DALL-E, Midjourney) on T2I personalization, creating human-readable prompts with superior visual accuracy—no more manual prompt engineering headaches! 5/🧵

1

0

2

Yutong (Kelly) He

@electronickale

6 months

🧠 Our solution: Inspired by LLM jailbreaking (yup, really), PRISM iteratively refines human-readable prompts via in-context learning from VLMs. A magic trio, Prompt Engineer Assistant (VLM), any T2I Generator, and Judge (another VLM), powers this feedback loop! 4/🧵

1

0

3