
kyutai
@kyutai_labs
Followers
24K
Following
112
Media
56
Statuses
111
Paris, France
Joined November 2023
Talk to https://t.co/CpQTspHXbi 🔊, the most modular voice AI around. Empower any text LLM with voice, instantly, by wrapping it with our new speech-to-text and text-to-speech. Any personality, any voice. Interruptible, smart turn-taking. We’ll open-source everything within the
117
264
2K
🚀New models: ARC-Encoders We introduce a lightweight encoder that compresses context into continuous representations for LLMs, reducing inference cost while preserving performance. Our Adaptable text Representations Compressor, named ARC-Encoder, achieves large efficiency gains
2
25
177
When combined, our TTS and STT can provide a voice to any text LLM, VLM, etc., allowing real-time voice interaction. Try that out in our demo: https://t.co/CpQTspIv0Q Paper: https://t.co/ZvsEElbAmE Models: https://t.co/KjDMx7Kgdd TTS evaluation:
github.com
Contribute to kyutai-labs/tts_longeval development by creating an account on GitHub.
0
3
17
We also train a model on a mix of publicly available datasets, to show that the improvements come from the architecture and not just proprietary data.
1
0
5
We train decoder-only models on time-aligned text and audio data, delaying the “output stream” so that it can be predicted from the “input stream”. At inference, we teacher-force the input stream and predict the output stream. This way, we can build both TTS and STT.
1
0
5
The throughput gains come from the fact that DSMs are easy to batch. The same is true for Kyutai Speech-To-Text, which outperforms Whisper-Streaming in throughput by two orders of magnitude.
1
1
5
We’ve released a preprint on Delayed Streams Modeling (DSM), the framework behind our open, streaming text-to-speech and speech-to-text. Kyutai TTS, powered by DSM, is blazingly fast and competitive with SotA models in quality while providing the best voice cloning. 🧵
1
14
93
🗣️We can listen and speak simultaneously when we talk, and so should the spoken dialogue models (SDMs)! 💬Unlike typical "walkie-talkie" voice AIs, full-duplex SDMs let both sides talk at once - more like real, natural conversation. But this makes alignment harder: - No
1
12
66
If you're at #ICML2025 this week, come check out these 3 posters from our lab🟢! - Aligning Spoken Dialog Models from User Interactions, @anne_youw Thu 17 Jul 11am-1:30pm W-316 - High-Fidelity Simultaneous Speech-To-Speech Translation, @tom_labiausse Wed 16 Jul 4:30pm - 7pm
0
3
29
I’m happy to share that I’ll be attending ICML 2025 in Vancouver next week to present 𝐇𝐢𝐛𝐢𝐤𝐢 [ https://t.co/q50LBOvM1h] 🇫🇷🇬🇧 — Kyutai’s real-time and expressive speech translation system. I'll be presenting the poster on Wednesday, July 16 at 4:30PM, feel free to stop by! 💬
2
9
61
Kyutai TTS comes with hundreds of voices based on Expresso and VCTK. If you would like to see more voices, help us by donating your voice at
2
2
41
Unmute turns a text LLM into a voice AI. At https://t.co/CpQTspIv0Q, it’s @MistralAI's Mistral-Small-3.2-24B, making it fully open-source. Play a quiz game with a snarky host, catch up on tech news, or just hang out and talk. Or modify it to do anything you want!
9
28
155
We’re also releasing the code for https://t.co/CpQTspIv0Q, the modular voice AI system. Make your own personal assistant, make it role-play, give a voice to your agent, or connect it to external tools. You can make it fit onto a single GPU. https://t.co/TgNPIBkuDD
2
7
76
Kyutai TTS and Unmute are now open source! The text-to-speech is natural, customizable, and fast: it can serve 32 users with a 350ms latency on a single L40S. Try it out and get started on the project page: https://t.co/B4P9FuOrQc
51
178
1K
Available in PyTorch, MLX, on your iPhone, or in Rust for your server needs! Project Page: https://t.co/bQMP56XIAa OpenASR Leaderboard:
huggingface.co
3
4
35
Our latest open-source speech-to-text model just claimed 1st place among streaming models and 5th place overall on the OpenASR leaderboard 🥇🎙️ While all other models need the whole audio, ours delivers top-tier accuracy on streaming content. Open, fast, and ready for production!
17
41
368
The other model is a lightweight English/French 1B model optimized for real-time voice chat apps like https://t.co/CpQTspHXbi. It comes with a semantic voice activity detector that predicts if you’re done talking or just pausing mid-sentence. The open-source releases of Kyutai
2
5
67
Today we are releasing two models. The first one is a 2.6B English-only model that beats Whisper Large v3 on benchmarks even though it’s a streaming model that doesn’t process all the audio at once. It can process 400 sequences in parallel on a single H100.
2
6
51
Kyutai Speech-To-Text is now open-source! It’s streaming, supports batched inference, and runs blazingly fast: perfect for interactive applications. Check out the details here: https://t.co/bQMP56XaKC
34
118
627
Using Unmute with a custom voice and prompt to create a very intense ice cream seller, inspired by Justin Kuritzkes' sketch🍦
7
8
180