
Rohit Girdhar
@_rohitgirdhar_
Followers
2K
Following
486
Media
24
Statuses
96
Research Scientist at Meta GenAI
New York
Joined September 2018
Super excited to share MovieGen: new SOTA media generation system! When we started, I didn’t think we’d get this far this quickly. But turns out a simplified approach (flow matching) paired with scaling up model size and data, indeed works amazingly well! Details in the paper 😀.
🎥 Today we’re premiering Meta Movie Gen: the most advanced media foundation models to-date. Developed by AI research teams at Meta, Movie Gen delivers state-of-the-art results across a range of capabilities. We’re excited for the potential of this line of research to usher in
2
6
69
RT @CMHungSteven: @CVPR is around the corner!!.Join us at the Workshop on T4V at #CVPR2025 with a great speaker lineup (@MikeShou1, @jw2yan….
0
19
0
And check out another paper we just put online: DiTo! A new image/video tokenization approach, trained purely using diffusion, modernizing the tokenization pipeline and making it a lot simpler and scalable!.
Introducing “Diffusion Autoencoders are Scalable Image Tokenizers” (DiTo). We show that with proper designs and scaling up, diffusion autoencoders (a single L2 loss) can outperform the GAN-LPIPS tokenizers (hybrid losses) used in current SOTA generative models. (1/4)
0
5
31
RT @dtrinh: VERY excited about the era of generative AR we're bringing to life. Check out this preview!. It's early but so damn promising —….
0
18
0
Cc @GalChechik since you were wondering what we’d been up to since the emu video work we were just talking about at ECCV 😊.
1
0
2
Excited to share llama3.1, that brings multimodal capabilities to your favorite open source LLM using simple, post-trained adapters! Great experience building w/ our incredible multimodal team, and espl my partners in crime for all things video, @mannat_singh and @filipradenovic!.
Starting today, open source is leading the way. Introducing Llama 3.1: Our most capable models yet. Today we’re releasing a collection of new Llama 3.1 models including our long awaited 405B. These models deliver improved reasoning capabilities, a larger 128K token context
2
4
61
@xiaolonw @liliyu_lili @p_bojanowski Panel discussion with @_tim_brooks @xiaolonw @hila_chefer @liliyu_lili @antoine77340
0
0
2
@xiaolonw @liliyu_lili And the final talk of the day by @p_bojanowski on self supervised learning of vision transformers!
1
0
0