Paras Jain
@parasjain
Followers
3K
Following
2K
Media
72
Statuses
540
World Models | CEO of @genmoai
San Francisco, CA
Joined April 2009
Closed AI won the left brain of AGI. We're here to make sure there's an open alternative for the right brain. Mochi 1 sets a new SOTA for open-source video generation models. It is the strongest OSS model in the ecosystem. This will be a force for good, both for AI research and
Introducing Mochi 1 preview. A new SOTA in open-source video generation. Apache 2.0. magnet:?xt=urn:btih:441da1af7a16bcaa4f556964f8028d7113d21cbb&dn=weights&tr=udp://tracker.opentrackr.org:1337/announce
49
93
1K
What if video AI finally felt real? For years, we’ve seen “AI video” that looked more like animated slideshows — stilted motion, broken physics, and prompts that miss the mark. That just changed. Mochi 1 from @genmoai is an open-source breakthrough in video generation — and it
2
2
9
T2V: Guess which AI video model?😉 Hint: A pioneering AI video brand with a recent update, to be announced soon.
4
4
13
From hackers to studios, millions already build with ComfyUI. Today, we’re taking it further: Comfy Cloud 🌩️ The full power of ComfyUI, now in your browser. No installs. No limits. Just create. Join private beta for free👇
127
146
1K
Amazing work with one of our researchers @ShyamgopalKart1!
We are releasing a paper I'm very excited about. We know test-time scaling is a path to greatly improved results, and achieves reasoning in the case of LLMs. We present a new and promising way to amortize it into training using HyperNetworks for image generation models.
0
0
4
I'm really excited to share our new formulation for post-training diffusion models! Here's why I think this formulation has the potential to be quite useful 👇
Reward hacking is challenging when fine-tuning few-step Diffusion models. Direct fine-tuning on rewards can create artifacts that game metrics while degrading visual quality. We propose Noise Hypernetworks as a theoretically grounded solution, inspired by test-time optimization.
5
8
67
Foundry → Mithril (@mithrilcompute): The AI Omnicloud. Now generally available! We’re redefining GPU cloud economics, workload flexibility, and ease-of-use—for the compound AI & agentic era. 🧵 (1/8)
5
13
89
Mirage was trained on Mochi 1! Amazing to see.
Today we're revealing the magic behind Mirage with the release of our technical report, linked below. ICYMI — Mirage, our omni-modal foundation model, generates expressive actors that actually look and feel human. Mirage is uniquely set apart by its ability to generate: •
3
1
15
Mining sparsity from Mochi 1 unlocks video-to-video editing! It was great collaborating with @ywen99 and @PandaAshwinee.
fine-grained editing of videos is hard. if I use a Video Diffusion Transformer to make my videos, just adding "red" to the prompt totally changes the video. in our new paper, we dive deep into the attention maps of VDiTs and find a way to do fine-grained editing, and other stuff!
1
1
12
We also release our training code and details here https://t.co/1tAeWpIHdd
github.com
Code for full fintuing Mochi model with FSDP (and CP) - Yaofang-Liu/Mochi-Full-Finetuner
2
1
14
Pusa is out on Hugging Face Thousands Timesteps Video Diffusion Model A single model that unlocks: • Text-to-Video • Image-to-Video • Start/End Frames to Video • Video Transitions • Video Extensions • Next-frame prediction • Novel sampling
13
89
408
Start and end key frame support is here on Mochi!
Never seen a R1 moment in video diffusion models??😰Can't things just emerge using very low cost??🧐Certainly can!!!! 🚀 Introducing Pusa now! Pusa: Thousands Timesteps Video Diffusion Model — A single model that unlocks: Text-to-Video → • Image-to-Video •
0
1
5
Never seen a R1 moment in video diffusion models??😰Can't things just emerge using very low cost??🧐Certainly can!!!! 🚀 Introducing Pusa now! Pusa: Thousands Timesteps Video Diffusion Model — A single model that unlocks: Text-to-Video → • Image-to-Video •
7
25
106
Mechanistic interpretability researcher:
1
1
17
Meet Phonic, the next-generation speech-to-speech platform focused on reliability We’ve all gotten stuck speaking on the phone to an AI that doesn’t understand you At Phonic, we’ve rethought the whole stack from model training to voice evals to compound systems for reliability
12
21
156
I’m presenting about Mochi 1 and video generation here at Edge, excited to talk about recent community progress
luma.com
Take Intelligence to the Edge - Together We Break Through AI is redefining creativity, powering real-time game design, film production, music, and interactive…
1
0
9
Awesome work that makes Mochi dance!
Text-to-video models are silent🔇, but does that mean they don't know music, beat, and tempo🎶? I'm excited to present MusicInfuser🎹, an adapter network which aligns silent dancing videos to music. Check out our paper, examples, code, and weights here: https://t.co/6jvEb9H40x
0
0
4