David W. Romero
@davidwromero
Followers
3K
Following
1K
Media
61
Statuses
655
AI Research @Cartesia_ai. Prev: @NVIDIA, @GoogleAI, @Qualcomm, @Merl_news, PhD in Efficient Deep Learning @VUamsterdam. Opinions my own.
San Francisco, CA
Joined October 2019
After two years of great learnings and beautiful memories at @nvidia , I am happy to announce that I have joined @cartesia_ai! Here, I will be developing the next generation of neural architectures, with the goal of creating real-time intelligence AI agents with life-long
5
4
123
The @cartesia_ai research team will be at NeurIPS this year, and we will be hosting a Happy Hour event! Want to talk about how we will achieve multimodal real-time life-long context? Come join us! 🥳
We’re headed to NeurIPS next month, we’ll be sponsoring the conference and have an expo booth to meet us We’re also hosting the after hours event we always wish had existed at NeurIPS! Hang out with @_albertgu, me and the rest of the team Limited spots sign up below
0
1
17
We are hiring @cartesia_ai! 🚀 We're building real-time multimodal models with life-long (really long!) context. We're looking for: - Large scale training infra experts (TP/CP/SP/FSDP2) - People with interest in architectures beyond pure transformers. Interested? DM me!
35
37
508
Today, we are releasing Sonic-3, the fastest, most natural AI voice model out there, together with a guide to clone your own in less than 10m. Give it a try! Details on @krandiash's thread below! 👇
We've raised $100M from Kleiner Perkins, Index Ventures, Lightspeed, and NVIDIA. Today we're introducing Sonic-3 - the state-of-the-art model for realtime conversation. What makes Sonic-3 great: - Breakthrough naturalness - laughter and full emotional range - Lightning fast -
0
2
23
Part of Meta layoffs? We're hiring researchers and engineers at @cartesia_ai (SF + London) We're inventing crazy new architectures and using a large-scale data engine to build infinite context assistants We like researchers who get their hands dirty DM or email
13
28
278
At Cartesia, we’re committed to advancing voice AI for the enterprise. That’s why we’re proud to integrate Cartesia’s state-of-the-art voice AI technology with new @ServiceNow AI Voice Agents, part of the company’s recently announced AI Experience. Read more:
0
10
39
We are honored to be featured in Fast Company's Next Big Things in Tech for our work in giving conversational AI a powerful voice. We’re proud to be recognized alongside so many inspiring builders and innovators pushing the frontier of technology forward. A huge thank you to our
fastcompany.com
From workplace productivity to medical research to gaming and beyond, this was the year AI got real—and it was hardly the only source of 2025’s breakthroughs.
1
4
19
What a team effort! Very happy and proud of the entire team! 💚
Ranked #1 on @Meta's Physical Reasoning Leaderboard on @huggingface for a reason. 👏 🔥 🏆 Cosmos Reason enables robots and AI agents to reason like humans by leveraging prior knowledge, physics, and common sense to intelligently interact with the real world. This
1
0
15
What a team effort! Very happy and proud of the entire team for this amazing result! 💚
Ranked #1 on @Meta's Physical Reasoning Leaderboard on @huggingface for a reason. 👏 🔥 🏆 Cosmos Reason enables robots and AI agents to reason like humans by leveraging prior knowledge, physics, and common sense to intelligently interact with the real world. This
0
0
4
Why do video models handle motion so poorly? It might be lack of motion equivariance. Very excited to introduce: Flow Equivariant RNNs (FERNNs), the first sequence models to respect symmetries over time. Paper: https://t.co/dkk43PyQe3 Blog: https://t.co/I1gpam1OL8 1/🧵
8
72
396
Tokenization has been the final barrier to truly end-to-end language models. We developed the H-Net: a hierarchical network that replaces tokenization with a dynamic chunking process directly inside the model, automatically discovering and operating over meaningful units of data
98
754
5K
Interested in efficient equivariance for long-context? Visit our Geoemtric Hyena poster at ICML! ⭐️ Spotlight⭐️ When: 11 a.m. — 1:30 p.m Wed July 16 Where: East Exhibition Hall A-B #E-3103
ICML Spotlight 🚨 Equivariance is too slow and expensive, especially when you need global context. It makes us wonder if it even worths the cost, especially in high-dimensional problems? We present Geometric Hyena Networks — a simple equivariant model orders of magnitude more
0
4
22
I’ll be at ICML this week, reach out if you’d like to chat about 👉 research at @cartesia_ai 👉 alternate architectures and tokenizer free hierarchies 👉 the future of voice and multimodal interaction We also have a booth where you can come by and say hi to the team
0
4
57
Tokenization is just a special case of "chunking" - building low-level data into high-level abstractions - which is in turn fundamental to intelligence. Our new architecture, which enables hierarchical *dynamic chunking*, is not only tokenizer-free, but simply scales better.
Tokenization has been the final barrier to truly end-to-end language models. We developed the H-Net: a hierarchical network that replaces tokenization with a dynamic chunking process directly inside the model, automatically discovering and operating over meaningful units of data
61
197
1K
HMAR models and code are finally public! 🥳
Happy to share that our HMAR code and pre-trained models are now publicly available. Please try them out here: code: https://t.co/HZloGGrLFG checkpoints:
0
1
5
I converted one of my favorite talks I've given over the past year into a blog post. "On the Tradeoffs of SSMs and Transformers" (or: tokens are bullshit) In a few days, we'll release what I believe is the next major advance for architectures.
27
117
783
Cosmos-Reason1 has exciting updates 💡 Now it understands physical reality — judging videos as real or fake! Check out the resources👇 Paper: https://t.co/TcqqvrhqAD Huggingface: https://t.co/hOLno2IyhW Code: https://t.co/UUg90bmcGW Project page: https://t.co/Dr6ZqnKM8o (1/n)
2
32
101