Wenhao Chai @wenhaocha1 X Profile

Wenhao Chai

@wenhaocha1

Followers

2K

Following

7K

Media

55

Statuses

732

Ph.D. Student @PrincetonCS with @liuzhuang1234. Prev @Stanford @UW @pika_labs @MSFTResearch @UofIllinois. I work on computer vision and more.

https://t.co/htGOTdegjS

Princeton, NJ (NYC at times)

Joined January 2022

Don't wanna be here? Send us removal request.

Yingru Li

@RichardYRLi

2 days

@danielhanchen, glad you liked the post! You're spot on to suspect lower-level implementation issues. That's exactly what we found in the original blog. The disable_cascade_attn finding (Sec 4.2.4) was the symptom, but the root cause was that silent FlashAttention-2 kernel bug

Daniel Han

@danielhanchen

2 days

@_arohan_ :) Original plots come from https://t.co/KOBqOoaeLq - also their blog is super good! - still unsure if the FP16 vs BF16 debate is due to hardware issues due to FP32 accumulation sizes - planning to run some experiments!

8

24

338

Jiaxin Shi

@thjashin

5 days

Had fun contributing a bit to this project! I especially liked this - masked diffusion (any-order generation) can be better than fixed-order AR on problems without a canonical ordering

Tom Zahavy

@TZahavy

6 days

I am excited to share a work we did in the Discovery team at @GoogleDeepMind using RL and generative models to discover creative chess puzzles 🔊♟️♟️ #neurips2025 🎨While strong chess players intuitively recognize the beauty of a position, articulating the precise elements that

2

5

50

Jia-Bin Huang

@jbhuang0604

6 days

How to organize your talk? I used to present like this, thinking that I was being "academic", "organized", and "professional". BUT, from the audience's viewpoints, this sucks. 😱 Look how far they need to hold a long-term context to just make sense of what you're saying!

5

24

338

Rui-Jie (Ridger) Zhu

@RidgerZhu

5 days

Thrilled to release new paper: “Scaling Latent Reasoning via Looped Language Models.” TLDR: We scale up loop language models to 2.6 billion parameters, and pretrained on > 7 trillion tokens. The resulting model is on par with SOTA language models of 2 to 3x size.

20

117

573

Jiatao Gu

@thoma_gu

6 days

Might also be interested in checking our TARFlow series! TARFlow: https://t.co/Gb7NETqEw2 ICML2025 Oral STARFlow: https://t.co/bpkY7SYx4z NeurIPS2025 Spotlight TARFlow-LM: https://t.co/BLHoXt9m5Q NeurIPS 2025 … and more maybe soon🤖

arxiv.org

Autoregressive models have driven remarkable progress in language modeling. Their foundational reliance on discrete tokens, unidirectional context, and single-pass decoding, while central to their...

Alexia Jolicoeur-Martineau

@jm_alexia

6 days

Normalizing Flow is back!

0

10

100

Tanishq Kumar

@tanishqkumar07

7 days

Please steal my AI research ideas. This is a list of research questions and concrete experiments I would love to see done, but don't have bandwidth to get to. If you are looking to break into AI research (e.g. as an undergraduate, or a software engineer in industry), these are

47

203

2K

Brian Bo Li

@BoLi68567011

8 days

@wenhaocha1 Thanks, Wenhao! Really appreciated your recognition, and really lucky to meet you back in the early days when we were all starting developing multimodal models - so many new models, datasets, and discussions, bringing many new insights to everyone. From lmms-eval to

0

1

2

Wenhao Chai

@wenhaocha1

8 days

Back in 2024, LMMs-Eval built a complete evaluation ecosystem for the MLLM/LMM community, with countless researchers contributing their models and benchmarks to raise the whole edifice. I was fortunate to be one of them: our series of video-LMM works (MovieChat, AuroraCap, VDC)

Brian Bo Li

@BoLi68567011

10 days

Throughout my journey in developing multimodal models, I’ve always wanted a framework that lets me plug & play modality encoders/decoders on top of an auto-regressive LLM. I want to prototype fast, try new architectures, and have my demo files scale effortlessly — with full

2

3

29

Wenhao Chai

@wenhaocha1

8 days

world model

Tesla

@Tesla

10 days

To push self-driving into situations wilder than reality, we built a neural network world simulator that can create entirely synthetic worlds for the Tesla to drive in. Video below is fully generated & not a real video

5

9

210

Diyi Yang

@Diyi_Yang

10 days

Stanford NLP 25th Anniversary🤩🤩🤩

Stanford NLP Group

@stanfordnlp

10 days

Today, we’re overjoyed to have a 25th Anniversary Reunion of @stanfordnlp. So happy to see so many of our former students back at @Stanford. And thanks to @StanfordHAI for the venue!

9

39

600

Brian Bo Li

@BoLi68567011

10 days

Throughout my journey in developing multimodal models, I’ve always wanted a framework that lets me plug & play modality encoders/decoders on top of an auto-regressive LLM. I want to prototype fast, try new architectures, and have my demo files scale effortlessly — with full

9

33

104

Mark Chen

@markchen90

13 days

@josh1yan I joined @OpenAI as a resident. First, get the fundamentals down. If there's one subject you need to know inside and out, it's linear algebra. Read and understand a classic textbook like Bishop's Pattern Recognition and Machine Learning. Then, take on an ambitious project. I

7

30

700

Jaskirat Singh @ ICCV2025🌴

@1jaskiratsingh

13 days

end-to-end training just makes latent diffusion transformers better! with repa-e, we showed the power of end-to-end training on imagenet. today we are extending it to text-to-image (T2I) generation. #ICCV2025 🌴 🚨 Introducing "REPA-E for T2I: family of end-to-end tuned VAEs for

1

17

42

Martin Ziqiao Ma

@ziqiao_ma

14 days

Congrats to FlowEdit for winning #ICCV2025 Best Student Paper. “Inversion-free” is a very cool idea. We proposed the first inversion-free, optimization-free, and model-agnostic framework (for latent diffusion and consistency models) back at CVPR 2024 ( https://t.co/zMrIfyVFpq).

Martin Ziqiao Ma

@ziqiao_ma

2 years

Want to edit your image with language descriptions in less than 3s? Ever questioned the need for prolonged inversion in text-guided editing? We are happy to release ♾ InfEdit (with demo), a flexible framework for fast, faithful and consistent editing. 🔗 https://t.co/NwZvoEh7ho

4

42

297

Martin Ziqiao Ma

@ziqiao_ma

14 days

I’ve always wanted to write an open-notebook research blog to (i) show the chain of thought behind how we formed hypotheses, designed experiments, and articulated findings, and (ii) lay out all the intermediate results that did not make it into the final paper, including negative

4

42

221

Wenhao Chai

@wenhaocha1

15 days

Our paper Video-MMLU has been awarded Outstanding Paper at the ICCV Workshop! I happened to receive this wonderful news while soaking in the water couldn’t be happier! Huge thanks to the Knowledge-Intensive Multimodal Reasoning Workshop Committee for the honor.

Enxin Song

@EnxinSong

6 months

🎉 Introducing Video-MMLU, a new benchmark for evaluating large multimodal models on classroom-style lectures in math, physics, and chemistry! 🧑‍🏫📚Video-MMLU requires strong reasoning capabilities and world knowledge compared to the previous benchmarks for video LMMs.

4

7

79

Wenhao Chai

@wenhaocha1

19 days

Paper: https://t.co/wvMxdVTvHl Blog: https://t.co/2TPDgzrygX (you can also see updated LiveCodeBench Pro leaderboard) How to start evaluation: https://t.co/zi441pmL9z Amazing work led by @ZhouShang for AutoCode and @ZihanZheng71803 for Eval toolkit.

arxiv.org

Writing competitive programming problems is exacting. Authors must: set constraints, input distributions, and edge cases that rule out shortcuts; target specific algorithms (e.g., max-flow,...

0

3

11

Wenhao Chai

@wenhaocha1

19 days

LiveCodeBench Pro remains one of the most challenging code benchmarks, but its evaluation and verification process is still a black box. We introduce AutoCode, which democratizes evaluation allowing anyone to locally run verification and perform RL training! For the first time,

4

29

124

Jihan Yang

@jihanyang13

27 days

So excited to be part of the team bringing the 1st Multimodal Spatial Intelligence (MUSI) workshop to @ICCVConference, with a huge shout-out to @songyoupeng for leading the effort! We've put together an incredible program. If you'll be at ICCV, you should definitely stop by! 🗓️

Songyou Peng

@songyoupeng

27 days

📣 Announcing MUSI: 1st Multimodal Spatial Intelligence Workshop @ICCVConference! 🎙️All-star keynotes: @sainingxie, @ManlingLi_, @RanjayKrishna, @yuewang314, and @QianqianWang5 - plus a panel on the future of the field! 🗓 Oct 20, 1pm-5:30pm HST 🔗 https://t.co/wZaWKRIcYI

0

6

29

Peter Tong

@TongPetersb

21 days

The work opened my eyes. Since my PhD, I've been studying visual representations for understanding and generation. I long thought pretrained vision encoders (CLIP, DINO, etc.) produced features too semantic for generation/reconstruction, but that's not true! These features

Saining Xie

@sainingxie

21 days

three years ago, DiT replaced the legacy unet with a transformer-based denoising backbone. we knew the bulky VAEs would be the next to go -- we just waited until we could do it right. today, we introduce Representation Autoencoders (RAE). >> Retire VAEs. Use RAEs. 👇(1/n)

13

44

486