
Rohit Girdhar
@_rohitgirdhar_
Followers
2K
Following
499
Media
24
Statuses
97
Research Scientist at Meta GenAI
New York
Joined September 2018
Super excited to share MovieGen: new SOTA media generation system! When we started, I didn’t think we’d get this far this quickly. But turns out a simplified approach (flow matching) paired with scaling up model size and data, indeed works amazingly well! Details in the paper 😀.
🎥 Today we’re premiering Meta Movie Gen: the most advanced media foundation models to-date. Developed by AI research teams at Meta, Movie Gen delivers state-of-the-art results across a range of capabilities. We’re excited for the potential of this line of research to usher in
2
6
69
RT @CMHungSteven: @CVPR is around the corner!!.Join us at the Workshop on T4V at #CVPR2025 with a great speaker lineup (@MikeShou1, @jw2yan….
0
19
0
And check out another paper we just put online: DiTo! A new image/video tokenization approach, trained purely using diffusion, modernizing the tokenization pipeline and making it a lot simpler and scalable!.
Introducing “Diffusion Autoencoders are Scalable Image Tokenizers” (DiTo). We show that with proper designs and scaling up, diffusion autoencoders (a single L2 loss) can outperform the GAN-LPIPS tokenizers (hybrid losses) used in current SOTA generative models. (1/4)
0
5
31
And that’s not all! It performed surprisingly competitively on image/video/audio captioning, and could even perform style transfer and cross-modal arithmetic. Check all the details in our paper: And our code:
github.com
Code release for "LLMs can see and hear without any training" - facebookresearch/MILS
2
3
24
RT @dtrinh: VERY excited about the era of generative AR we're bringing to life. Check out this preview!. It's early but so damn promising —….
0
18
0
Check out this result (page 25~26) and more in the arxiv:
arxiv.org
We present Movie Gen, a cast of foundation models that generates high-quality, 1080p HD videos with different aspect ratios and synchronized audio. We also show additional capabilities such as...
0
0
3
Cc @GalChechik since you were wondering what we’d been up to since the emu video work we were just talking about at ECCV 😊.
1
0
2
Indeed! Come talk to @imisra_ and myself about Emu Video ( at the #ECCV2024 poster session at 10:30AM 😀. or maybe there's more. ? 🤔.
emu-video.metademolab.com
Factorizing Text-to-Video Generation by Explicit Image Conditioning
1
1
40
Excited to share llama3.1, that brings multimodal capabilities to your favorite open source LLM using simple, post-trained adapters! Great experience building w/ our incredible multimodal team, and espl my partners in crime for all things video, @mannat_singh and @filipradenovic!.
Starting today, open source is leading the way. Introducing Llama 3.1: Our most capable models yet. Today we’re releasing a collection of new Llama 3.1 models including our long awaited 405B. These models deliver improved reasoning capabilities, a larger 128K token context
2
4
61
@xiaolonw @liliyu_lili @p_bojanowski Panel discussion with @_tim_brooks @xiaolonw @hila_chefer @liliyu_lili @antoine77340
0
0
2
@xiaolonw @liliyu_lili And the final talk of the day by @p_bojanowski on self supervised learning of vision transformers!
1
0
0