Yutong (Kelly) He Profile
Yutong (Kelly) He

@electronickale

Followers
928
Following
1K
Media
17
Statuses
117

PhD student @mldcmu, I’m so delusional that doing generative modeling is my job

Pittsburgh, PA
Joined March 2021
Don't wanna be here? Send us removal request.
@electronickale
Yutong (Kelly) He
6 months
✨ Love 4o-style image generation but prefer to use Midjourney? Tired of manual prompt crafting from inspo images? PRISM to the rescue! 🖼️→📝→🖼️ We automate black-box prompt engineering—no training, no embeddings, just accurate, readable prompts from your inspo images! 1/🧵
3
31
84
@peholderrieth
Peter Holderrieth
18 days
New work: “GLASS Flows: Transition Sampling for Alignment of Flow and Diffusion Models”. GLASS generates images by sampling stochastic Markov transitions with ODEs - allowing us to boost text-image alignment for large-scale models at inference time! https://t.co/unsuG3mYer [1/7]
3
58
241
@RickyTQChen
Ricky T. Q. Chen
20 days
OneFlow is a native multimodal model using just insertions. Scales better than Transfusion across a range of benchmarks up to 8B. New capabilities: naturally hierarchical generation, CFG for text detailedness, concurrent image+text interleaved, & more. https://t.co/LeDEiyRnQd
Tweet card summary image
oneflow.framer.ai
OneFlow, the first non-autoregressive multimodal model that enables variable-length and concurrent mixed-modal generation.
@__JohnNguyen__
John Nguyen
20 days
Transfusion combines autoregressive with diffusion to train a single transformer, but what if we combine Flow with Flow? 🤔 🌊OneFlow🌊 the first non-autoregressive model to generate text and images concurrently using a single transformer—unifying Edit Flow (text) with Flow
2
19
150
@nmboffi
Nicholas Boffi
20 days
Consistency models, CTMs, shortcut models, align your flow, mean flow... What's the connection, and how should you learn them in practice? We show they're all different sides of the same coin connected by one central object: the flow map. https://t.co/QBp1kELVhF 🧵(1/n)
5
69
341
@dylanjsam
Dylan Sam
1 month
🚨Excited to introduce a major development in building safer language models: Safety Pretraining! Instead of post-hoc alignment, we take a step back and embed safety directly into pretraining. 🧵(1/n)
7
90
342
@yus167
Yuda Song
2 months
LLMs lose diversity after RL post-training, and this hurts test-time scaling & creativity. Why does this collapse happen, and how can we fix it? Our new work introduces: 🔍 RL as Sampling (analysis) 🗺️ Outcome-based Exploration (intervention) [1/n]
9
87
469
@helibenhamu
Heli Ben-Hamu
2 months
Excited to share our work Set Block Decoding! A new paradigm combining next-token-prediction and masked (or discrete diffusion) models, allowing parallel decoding without any architectural changes and with exact KV cache. Arguably one of the simplest ways to accelerate LLMs!
3
24
115
@electronickale
Yutong (Kelly) He
2 months
Holy shi this is legit one of most creative papers I’ve read in a while
@rohanpaul_ai
Rohan Paul
2 months
🎨 Super impressive. Researchers built an optical diffusion model that uses light, not heavy computer math, to generate images with quality similar to standard AI diffusion but at near-zero power. This work shows a light powered image generator that produces images with almost
0
0
1
@electronickale
Yutong (Kelly) He
3 months
At this point, I propose to cancel rebuttal altogether, you only get one shot, don’t miss your chance to blow, opportunity comes once in a lifetime, bro
6
1
40
@electronickale
Yutong (Kelly) He
3 months
My rebuttal a year ago: Thank you so much for your thoughtful feedbacks and we would like to address your concerns below… My rebuttal last time: Here is our results. And I won’t thank you because I wont have space. My rebuttal this time: Trust me bro
2
1
49
@electronickale
Yutong (Kelly) He
4 months
🔥🔥🔥
@sukjun_hwang
Sukjun (June) Hwang
4 months
Tokenization has been the final barrier to truly end-to-end language models. We developed the H-Net: a hierarchical network that replaces tokenization with a dynamic chunking process directly inside the model, automatically discovering and operating over meaningful units of data
0
0
6
@RickyTQChen
Ricky T. Q. Chen
5 months
Padding in our non-AR sequence models? Yuck. 🙅 👉 Instead of unmasking, our new work *Edit Flows* perform iterative refinements via position-relative inserts and deletes, operations naturally suited for variable-length sequence generation. Easily better than using mask tokens.
8
80
518
@electronickale
Yutong (Kelly) He
5 months
Congrats Avi! 🎉🎉🎉
@A_v_i__S
Avi Schwarzschild
5 months
Big news! 🎉 I’m joining UNC-Chapel Hill as an Assistant Professor in Computer Science starting next year! Before that, I’ll be spending time @OpenAI working on LLM privacy. @unccs @uncnlp
0
0
6
@FahimTajwar10
Fahim Tajwar
5 months
RL with verifiable reward has shown impressive results in improving LLM reasoning, but what can we do when we do not have ground truth answers? Introducing Self-Rewarding Training (SRT): where language models provide their own reward for RL training! 🧵 1/n
21
146
843
@jmuiuc
Jian Ma
6 months
LLOKI (a variant of Loki):
@jmuiuc
Jian Ma
7 months
Integrating spatial transcriptomics across platforms is hard - different gene panels, sparse data. We introduce LLOKI, using optimal transport + single-cell FMs for unified ST integration. Work led by @ellie_haber (@mldcmu) & Lane Fellow @SpencerKrieger
1
1
8
@electronickale
Yutong (Kelly) He
6 months
When the ddl is approaching and you are violently editing something you wrote a while ago
0
0
17
@electronickale
Yutong (Kelly) He
6 months
🧩 Multi-concept generation becomes intuitive too! PRISM lets you easily identify and combine different components from reference images into coherent scenes. 7/🧵
1
0
2
@electronickale
Yutong (Kelly) He
6 months
✏️ Want to tweak your generated image? You can simply edit PRISM-generated prompts directly! Change outfits, poses or backgrounds, just modify the part of the prompt you want, no more mysterious embeddings or gibberish prompts! 6/🧵
1
0
4
@electronickale
Yutong (Kelly) He
6 months
📊 In our experiments, PRISM outperforms/matches baselines across popular text-to-image models (Stable Diffusion, DALL-E, Midjourney) on T2I personalization, creating human-readable prompts with superior visual accuracy—no more manual prompt engineering headaches! 5/🧵
1
0
2
@electronickale
Yutong (Kelly) He
6 months
🧠 Our solution: Inspired by LLM jailbreaking (yup, really), PRISM iteratively refines human-readable prompts via in-context learning from VLMs. A magic trio, Prompt Engineer Assistant (VLM), any T2I Generator, and Judge (another VLM), powers this feedback loop! 4/🧵
1
0
3