Sihyun Yu Profile
Sihyun Yu

@sihyun_yu

Followers
1K
Following
1K
Media
13
Statuses
164

Visiting Scholar @NYU_Courant Intern @NVIDIAAI | PhD @ KAIST | Ex-intern @NVIDIAAI and @GoogleAI | Generative models | https://t.co/wTvMmsjUdG

Daejeon
Joined July 2020
Don't wanna be here? Send us removal request.
@sihyun_yu
Sihyun Yu
1 year
Introducing REPA! We show that learning high-quality representations in diffusion transformers is crucial for boosting generation performance. With REPA, we speed up SiT training by 17.5x (without CFG) and achieve state-of-the-art FID = 1.42 using CFG with the guidance interval.
6
46
285
@boyangzheng_
Boyang Zheng
16 hours
Introducing Representation Autoencoders (RAE)! We revisit the latent space of Diffusion Transformers, replacing VAE with RAE: pretrained representation encoders (DINOv2, SigLIP2) paired with trained ViT decoders. (1/n)
5
44
375
@sainingxie
Saining Xie
16 hours
three years ago, DiT replaced the legacy unet with a transformer-based denoising backbone. we knew the bulky VAEs would be the next to go -- we just waited until we could do it right. today, we introduce Representation Autoencoders (RAE). >> Retire VAEs. Use RAEs. šŸ‘‡(1/n)
37
223
1K
@ma_nanye
Willis (Nanye) Ma
16 hours
Excited to introduce DiffuseNNX, a comprehensive JAX/Flax NNX-based library for diffusion and flow matching! It supports multiple diffusion / flow-matching frameworks, Autoencoders, DiT variants, and sampling algorithms. Repo: https://t.co/zOcA6nyrcM Delve into details below!
Tweet card summary image
github.com
A comprehensive JAX/NNX library for diffusion and flow matching generative algorithms, featuring DiT (Diffusion Transformer) and its variants as the primary backbone with support for ImageNet train...
3
42
156
@Almondgodd
anandmaj
19 days
I spent the past month reimplementing DeepMind’s Genie 3 world model from scratch Ended up making TinyWorlds, a 3M parameter world model capable of generating playable game environments demo below + everything I learned in thread (full repo at the end)šŸ‘‡šŸ¼
97
267
2K
@sainingxie
Saining Xie
2 months
I know op is click-baiting, but let me bite... fwiw every researcher’s DREAM is to find out their architecture is wrong. If it’s never wrong, that’s a bigger problem. we try to break DiT every day w/ SiT, REPA, REPA-E etc. but you gotta form hypotheses, run experiments, test, not
@sameQCU
ć‚µćƒ”QCU
2 months
bros, DiT is wrong. it's mathematically wrong. it's formally wrong. there is something wrong with it
12
56
544
@2prime_PKU
Yiping Lu
3 months
Anyone knows adam?
268
452
5K
@_akhaliq
AK
3 months
Enhancing Motion Dynamics of Image-to-Video Models via Adaptive Low-Pass Guidance
3
21
185
@SoojungYang2
Soojung Yang
3 months
šŸš€ Come check our poster at ICML @genbio_workshop! We show that pretrained MLIPs can accelerate training of Boltzmann emulators — by aligning their internal representations. Coauthors @LucasPinede, @junonam_, @RGBLabMIT (1/n)
2
18
151
@sihyun_yu
Sihyun Yu
3 months
I’ve wondered why I2V models tend to generate more static videos compared to their T2V counterparts. This project, led by @june_suk_choi, provides an analysis of this phenomenon and introduces a very simple (yet effective) fix to address it! Excited to have been part of this
@june_suk_choi
June Suk Choi
3 months
Excited to share Adaptive Low-Pass Guidance (ALG): a simple training-free, drop-in fix that brings dynamic motion back to Image-to-Video models! Demo videos, paper, & code below! https://t.co/4NzYDfCFSb (🧵 1/7)
0
2
29
@sainingxie
Saining Xie
3 months
@joserf28323 @CVPR @ICCVConference @nyuniversity Thanks for bringing this to my attention. I honestly wasn’t aware of the situation until the recent posts started going viral. I would never encourage my students to do anything like this—if I were serving as an Area Chair, any paper with this kind of prompt would be
10
28
215
@sihyun_yu
Sihyun Yu
4 months
Excited to share MDMs for molecule generation led by @bellaseo72 and @taewonKKK!
@bellaseo72
Hyunjin Seo
4 months
Meet MELD: a masked diffusion model (MDMs) designed for de novo molecule generation. MELD assigns per-element learnable noise schedule that tailors noise at the atom & bond level to avoid state-clashing problem. With MELD we achieve state-of-the-art property alignment in
0
1
11
@ZhengyangGeng
Zhengyang Geng
4 months
now the code is up here:
Tweet card summary image
github.com
JAX implementation of MeanFlow. Contribute to Gsunshine/meanflow development by creating an account on GitHub.
@ZhengyangGeng
Zhengyang Geng
5 months
Excited to share our work with my amazing collaborators, @Goodeat258, @SimulatedAnneal, @zicokolter, and Kaiming. In a word, we show an ā€œidentity learningā€ approach for generative modeling, by relating the instantaneous/average velocity in an identity. The resulting model,
2
17
71
@wenhaocha1
Wenhao Chai
4 months
We introduce LiveCodeBench Pro. Models like o3-high, o4-mini, and Gemini 2.5 Pro score 0% on hard competitive programming problems.
5
28
191
@ArashVahdat
Arash Vahdat
4 months
The slides for my CVPR talks are now available at
Tweet card summary image
latentspace.cc
Arash Vahdat is a Research Director, leading the fundamental generative AI research (GenAIR) team at NVIDIA Research. Before joining NVIDIA, he was a research scientist at D-Wave Systems where he...
@ArashVahdat
Arash Vahdat
4 months
I'm giving 3 talks at #CVPR2025 workshops and tutorials: 1⃣ "Rare Yet Real: Generative Modeling Beyond the Modes" will cover some of our work on gen AI for science where tail modeling and predictor calibration are crucial (Wed 11:10 - Room 102 B). https://t.co/IqDuwOXY2W
3
19
163
@CVPR
#CVPR2026
4 months
#CVPR2025 PAMI-TC awards
1
15
103
@RickyTQChen
Ricky T. Q. Chen
4 months
Padding in our non-AR sequence models? Yuck. šŸ™… šŸ‘‰ Instead of unmasking, our new work *Edit Flows* perform iterative refinements via position-relative inserts and deletes, operations naturally suited for variable-length sequence generation. Easily better than using mask tokens.
8
80
518
@sainingxie
Saining Xie
4 months
Had a great time at this CVPR community-building workshop---lots of fun discussions and some really important insights for early-career researchers. I also gave a talk on "Research as an Infinite Game." Here are the slides: https://t.co/T5FZS1A3CT
@anand_bhattad
Anand Bhattad
4 months
In this #CVPR2025 edition of our community-building workshop series, we focus on supporting the growth of early-career researchers. Join us tomorrow (Jun 11) at 12:45 PM in Room 209 Schedule: https://t.co/1fKzplQrU5 We have an exciting lineup of invited talks and candid
18
66
353
@ma_nanye
Willis (Nanye) Ma
4 months
Join us for a full-day tutorial on Scalable Generative Models in Computer Vision at @CVPR in Nashville, on Wednesday, June 11, from 9:00 AM to 5:00 PM in Room 202 B! šŸ‘‰ We are honored to have @sainingxie, @deeptigp, @thoma_gu, Kaiming He, @ArashVahdat, and @sherryyangML to
2
21
81
@younggyoseo
Younggyo Seo
5 months
Excited to present FastTD3: a simple, fast, and capable off-policy RL algorithm for humanoid control -- with an open-source code to run your own humanoid RL experiments in no time! Thread below 🧵
15
118
563
@sainingxie
Saining Xie
5 months
Indeed. For text-to-image, @xichen_pan had a great summary supporting this decoupled design philosophy: "Render unto diffusion what is generative, and unto LLMs what is understanding." We've repeatedly observed that diffusion gradients can negatively impact the backbone repr.
@YouJiacheng
You Jiacheng
5 months
as expected, this matches findings in unified multimodal understanding and generation models by @sainingxie: frozen VLM might help you. https://t.co/AwGBiNdN6R
12
36
227