Kevin Chih-Yao Ma @chihyaoma X Profile

Kevin Chih-Yao Ma

@chihyaoma

Followers

623

Following

465

Media

30

Statuses

103

Building multimodal foundation models @MicrosoftAI | Past: a lead IC & babysitter of Meta's MovieGen, Emu, Imagine, ...

https://t.co/wjxXxEewzP

Seattle, WA

Joined August 2014

Don't wanna be here? Send us removal request.

Kevin Chih-Yao Ma

@chihyaoma

6 months

💼 [Career Update] I feel fortunate to have spent the past 4 years at Meta, working with some of the brightest minds to build Meta’s multimodal foundation models — MovieGen, Emu, and more. It shaped me not only as a researcher but as a person. Now, it’s time for a new chapter. I

8

4

146

Mustafa Suleyman

@mustafasuleyman

2 months

Meet our third @MicrosoftAI model: MAI-Image-1 #9 on LMArena, striking an impressive balance of generation speed and quality Excited to keep refining + climbing the leaderboard from here! We're just getting started. https://t.co/33BiNfIjPg

37

82

518

Jiasen Lu

@jiasenlu

3 months

Vision tokenizers are stuck in 2020🤔while language models revolutionized AI🚀 Language: One tokenizer for everything Vision: Fragmented across modalities & tasks Introducing AToken: The first unified visual tokenizer for images, videos & 3D that does BOTH reconstruction AND

Aran Komatsuzaki

@arankomatsuzaki

3 months

Apple presents AToken: A unified visual tokenizer • First tokenizer unifying images, videos & 3D • Shared 4D latent space (preserves both reconstruction & semantics) • Strong across gen & understanding tasks (ImageNet 82.2%, MSRVTT 32.6%, 3D acc 90.9%)

6

73

374

Raphaël Dabadie🇫🇷

@RaphaelDabadie

3 months

🐺 Introducing the Werewolf Benchmark, an AI test for social reasoning under pressure. Can models lead, bluff, and resist manipulation in live, adversarial play? 👉 We made 7 of the strongest LLMs, both open-source and closed-source, play 210 full games of Werewolf. Below is

80

142

774

Fei-Fei Li

@drfeifei

4 months

A picture now is worth more than a thousand words in genAI; it can be turned into a full 3D world! And you can stroll in this garden endlessly long, it will still be there.

149

345

3K

NIK

@ns123abc

4 months

what a time we live in

119

314

2K

Google DeepMind

@GoogleDeepMind

4 months

What if you could not only watch a generated video, but explore it too? 🌐 Genie 3 is our groundbreaking world model that creates interactive, playable environments from a single text prompt. From photorealistic landscapes to fantasy realms, the possibilities are endless. 🧵

828

3K

14K

Guan Wang

@makingAGI

5 months

🚀Introducing Hierarchical Reasoning Model🧠🤖 Inspired by brain's hierarchical processing, HRM delivers unprecedented reasoning power on complex tasks like ARC-AGI and expert-level Sudoku using just 1k examples, no pretraining or CoT! Unlock next AI breakthrough with

227

653

4K

Demis Hassabis

@demishassabis

4 months

Introducing AlphaEarth Foundations an AI model that integrates petabytes of satellite data into a single digital representation of Earth. It'll give scientists a nearly real-time view of the planet to incredible spatial precision, and help with critical issues like food security,

deepmind.google

New AI model integrates petabytes of Earth observation data to generate a unified data representation that revolutionizes global mapping and monitoring

Google DeepMind

@GoogleDeepMind

4 months

Our new AI model AlphaEarth Foundations is mapping the planet in astonishing detail. 🌏🔍 Scientists will now be able to track the impact of deforestation, monitoring crop health, and more – significantly faster, thanks to our new datasets. 🧵

122

450

3K

Yung-Sung Chuang

@YungSungChuang

4 months

Scaling CLIP on English-only data is outdated now… 🌍We built CLIP data curation pipeline for 300+ languages 🇬🇧We train MetaCLIP 2 without compromising English-task performance (it actually improves! 🥳It’s time to drop the language filter! 📝 https://t.co/pQuwzH053M [1/5] 🧵

3

92

312

Aurko Roy

@aurko79

5 months

Excited to share what I worked on during my time at Meta. - We introduce a Triton-accelerated Transformer with *2-simplicial attention*—a tri-linear generalization of dot-product attention - We show how to adapt RoPE to tri-linear forms - We show 2-simplicial attention scales

26

98

816

Gabriele Berton

@gabriberton

5 months

Self-supervised Learning (SSL) vs Contrastive Language-Image (CLIP) models is a never-ending battle What about using both? This Google paper does exactly that, and results are really good on many different tasks TIPS is a model trained with a CLIP loss and 2 SSL losses [1/9]

5

75

602

Kevin Chih-Yao Ma

@chihyaoma

5 months

Transition Matching (TM) -- a discrete-time, continuous-state generative paradigm that unifies and advances both diffusion/flow models and continuous AR generation. Claimed to be the first fully causal model to match or surpass the performance of flow-based methods on

Neta Shaul

@shaulneta

5 months

[1/n] New paper alert! 🚀 Excited to introduce 𝐓𝐫𝐚𝐧𝐬𝐢𝐭𝐢𝐨𝐧 𝐌𝐚𝐭𝐜𝐡𝐢𝐧𝐠 (𝐓𝐌)! We're replacing short-timestep kernels from Flow Matching/Diffusion with... a generative model🤯, achieving SOTA text-2-image generation! @urielsinger @itai_gat @lipmanya

0

5

Mustafa Suleyman

@mustafasuleyman

5 months

We're taking a big step towards medical superintelligence. AI models have aced multiple choice medical exams – but real patients don’t come with ABC answer options. Now MAI-DxO can solve some of the world’s toughest open-ended cases with higher accuracy and lower costs.

139

506

3K

Seohong Park

@seohong_park

6 months

Q-learning is not yet scalable https://t.co/hoYUdAAeGZ I wrote a blog post about my thoughts on scalable RL algorithms. To be clear, I'm still highly optimistic about off-policy RL and Q-learning! I just think we haven't found the right solution yet (the post discusses why).

37

194

1K

Abhishek Das

@abhshkdz

6 months

We're excited to launch Scouts — always-on AI agents that monitor the web for anything you care about.

139

174

3K

Dreamina AI

@dreamina_ai

6 months

Seedance 1.0 has just claimed the #1 spot on the @ArtificialAnlys Video Arena Leaderboard! 🚀 Proudly topping the charts in BOTH “text-to-video” and “image-to-video”categories, Seedance 1.0 sets a new standard for AI video creation. Stay tuned as Seedance 1.0 will soon be

23

21

185

Kevin Chih-Yao Ma

@chihyaoma

6 months

Is the new king of video generation emerging? ByteDance’s Seed team is showing serious potential with their latest model, possibly outshining DeepMind’s Veo3, which was just announced weeks ago. I have been impressed by the consistent stream of inspiring work from Seed.

0

2

5

dr. jack morris

@jxmnop

6 months

new paper from our work at Meta! **GPT-style language models memorize 3.6 bits per param** we compute capacity by measuring total bits memorized, using some theory from Shannon (1953) shockingly, the memorization-datasize curves look like this: ___________ / / (🧵)

83

377

3K

Jason Wei

@_jasonwei

6 months

A recent clarity that I gained is viewing AI research as a “max-performance domain”, which means that you can be world-class by being very good at only one part of your job. As long as you can create seminal impact (e.g., train the best model, start a new paradigm, or create

27

55

668