Sihyun Yu @sihyun_yu X Profile

Sihyun Yu

@sihyun_yu

Followers

1K

Following

1K

Media

13

Statuses

164

Visiting Scholar @NYU_Courant Intern @NVIDIAAI | PhD @ KAIST | Ex-intern @NVIDIAAI and @GoogleAI | Generative models | https://t.co/wTvMmsjUdG

https://t.co/QoupKB0gKp

Daejeon

Joined July 2020

Don't wanna be here? Send us removal request.

Sihyun Yu

@sihyun_yu

1 year

Introducing REPA! We show that learning high-quality representations in diffusion transformers is crucial for boosting generation performance. With REPA, we speed up SiT training by 17.5x (without CFG) and achieve state-of-the-art FID = 1.42 using CFG with the guidance interval.

6

46

285

Boyang Zheng

@boyangzheng_

16 hours

Introducing Representation Autoencoders (RAE)! We revisit the latent space of Diffusion Transformers, replacing VAE with RAE: pretrained representation encoders (DINOv2, SigLIP2) paired with trained ViT decoders. (1/n)

5

44

375

Saining Xie

@sainingxie

16 hours

three years ago, DiT replaced the legacy unet with a transformer-based denoising backbone. we knew the bulky VAEs would be the next to go -- we just waited until we could do it right. today, we introduce Representation Autoencoders (RAE). >> Retire VAEs. Use RAEs. 👇(1/n)

37

223

1K

Willis (Nanye) Ma

@ma_nanye

16 hours

Excited to introduce DiffuseNNX, a comprehensive JAX/Flax NNX-based library for diffusion and flow matching! It supports multiple diffusion / flow-matching frameworks, Autoencoders, DiT variants, and sampling algorithms. Repo: https://t.co/zOcA6nyrcM Delve into details below!

github.com

A comprehensive JAX/NNX library for diffusion and flow matching generative algorithms, featuring DiT (Diffusion Transformer) and its variants as the primary backbone with support for ImageNet train...

3

42

156

anandmaj

@Almondgodd

19 days

I spent the past month reimplementing DeepMind’s Genie 3 world model from scratch Ended up making TinyWorlds, a 3M parameter world model capable of generating playable game environments demo below + everything I learned in thread (full repo at the end)👇🏼

97

267

2K

Saining Xie

@sainingxie

2 months

I know op is click-baiting, but let me bite... fwiw every researcher’s DREAM is to find out their architecture is wrong. If it’s never wrong, that’s a bigger problem. we try to break DiT every day w/ SiT, REPA, REPA-E etc. but you gotta form hypotheses, run experiments, test, not

サメQCU

@sameQCU

2 months

bros, DiT is wrong. it's mathematically wrong. it's formally wrong. there is something wrong with it

12

56

544

Yiping Lu

@2prime_PKU

3 months

Anyone knows adam?

268

452

5K

AK

@_akhaliq

3 months

Enhancing Motion Dynamics of Image-to-Video Models via Adaptive Low-Pass Guidance

3

21

185

Soojung Yang

@SoojungYang2

3 months

🚀 Come check our poster at ICML @genbio_workshop! We show that pretrained MLIPs can accelerate training of Boltzmann emulators — by aligning their internal representations. Coauthors @LucasPinede, @junonam_, @RGBLabMIT (1/n)

2

18

151

Sihyun Yu

@sihyun_yu

3 months

I’ve wondered why I2V models tend to generate more static videos compared to their T2V counterparts. This project, led by @june_suk_choi, provides an analysis of this phenomenon and introduces a very simple (yet effective) fix to address it! Excited to have been part of this

June Suk Choi

@june_suk_choi

3 months

Excited to share Adaptive Low-Pass Guidance (ALG): a simple training-free, drop-in fix that brings dynamic motion back to Image-to-Video models! Demo videos, paper, & code below! https://t.co/4NzYDfCFSb (🧵 1/7)

0

2

29

Saining Xie

@sainingxie

3 months

@joserf28323 @CVPR @ICCVConference @nyuniversity Thanks for bringing this to my attention. I honestly wasn’t aware of the situation until the recent posts started going viral. I would never encourage my students to do anything like this—if I were serving as an Area Chair, any paper with this kind of prompt would be

10

28

215

Sihyun Yu

@sihyun_yu

4 months

Excited to share MDMs for molecule generation led by @bellaseo72 and @taewonKKK!

Hyunjin Seo

@bellaseo72

4 months

Meet MELD: a masked diffusion model (MDMs) designed for de novo molecule generation. MELD assigns per-element learnable noise schedule that tailors noise at the atom & bond level to avoid state-clashing problem. With MELD we achieve state-of-the-art property alignment in

0

1

11

Zhengyang Geng

@ZhengyangGeng

4 months

now the code is up here:

github.com

JAX implementation of MeanFlow. Contribute to Gsunshine/meanflow development by creating an account on GitHub.

Zhengyang Geng

@ZhengyangGeng

5 months

Excited to share our work with my amazing collaborators, @Goodeat258, @SimulatedAnneal, @zicokolter, and Kaiming. In a word, we show an “identity learning” approach for generative modeling, by relating the instantaneous/average velocity in an identity. The resulting model,

2

17

71

Wenhao Chai

@wenhaocha1

4 months

We introduce LiveCodeBench Pro. Models like o3-high, o4-mini, and Gemini 2.5 Pro score 0% on hard competitive programming problems.

5

28

191

Arash Vahdat

@ArashVahdat

4 months

The slides for my CVPR talks are now available at

latentspace.cc

Arash Vahdat is a Research Director, leading the fundamental generative AI research (GenAIR) team at NVIDIA Research. Before joining NVIDIA, he was a research scientist at D-Wave Systems where he...

Arash Vahdat

@ArashVahdat

4 months

I'm giving 3 talks at #CVPR2025 workshops and tutorials: 1⃣ "Rare Yet Real: Generative Modeling Beyond the Modes" will cover some of our work on gen AI for science where tail modeling and predictor calibration are crucial (Wed 11:10 - Room 102 B). https://t.co/IqDuwOXY2W

3

19

163

#CVPR2026

@CVPR

4 months

#CVPR2025 PAMI-TC awards

1

15

103

Ricky T. Q. Chen

@RickyTQChen

4 months

Padding in our non-AR sequence models? Yuck. 🙅 👉 Instead of unmasking, our new work *Edit Flows* perform iterative refinements via position-relative inserts and deletes, operations naturally suited for variable-length sequence generation. Easily better than using mask tokens.

8

80

518

Saining Xie

@sainingxie

4 months

Had a great time at this CVPR community-building workshop---lots of fun discussions and some really important insights for early-career researchers. I also gave a talk on "Research as an Infinite Game." Here are the slides: https://t.co/T5FZS1A3CT

Anand Bhattad

@anand_bhattad

4 months

In this #CVPR2025 edition of our community-building workshop series, we focus on supporting the growth of early-career researchers. Join us tomorrow (Jun 11) at 12:45 PM in Room 209 Schedule: https://t.co/1fKzplQrU5 We have an exciting lineup of invited talks and candid

18

66

353

Willis (Nanye) Ma

@ma_nanye

4 months

Join us for a full-day tutorial on Scalable Generative Models in Computer Vision at @CVPR in Nashville, on Wednesday, June 11, from 9:00 AM to 5:00 PM in Room 202 B! 👉 We are honored to have @sainingxie, @deeptigp, @thoma_gu, Kaiming He, @ArashVahdat, and @sherryyangML to

2

21

81

Younggyo Seo

@younggyoseo

5 months

Excited to present FastTD3: a simple, fast, and capable off-policy RL algorithm for humanoid control -- with an open-source code to run your own humanoid RL experiments in no time! Thread below 🧵

15

118

563

Saining Xie

@sainingxie

5 months

Indeed. For text-to-image, @xichen_pan had a great summary supporting this decoupled design philosophy: "Render unto diffusion what is generative, and unto LLMs what is understanding." We've repeatedly observed that diffusion gradients can negatively impact the backbone repr.

You Jiacheng

@YouJiacheng

5 months

as expected, this matches findings in unified multimodal understanding and generation models by @sainingxie: frozen VLM might help you. https://t.co/AwGBiNdN6R

12

36

227