kyutai @kyutai_labs X Profile

kyutai

@kyutai_labs

Followers

24K

Following

112

Media

56

Statuses

111

https://t.co/HbYd0wCMEE

Paris, France

Joined November 2023

Don't wanna be here? Send us removal request.

kyutai

@kyutai_labs

5 months

Talk to https://t.co/CpQTspHXbi 🔊, the most modular voice AI around. Empower any text LLM with voice, instantly, by wrapping it with our new speech-to-text and text-to-speech. Any personality, any voice. Interruptible, smart turn-taking. We’ll open-source everything within the

117

264

2K

kyutai

@kyutai_labs

3 days

🚀New models: ARC-Encoders We introduce a lightweight encoder that compresses context into continuous representations for LLMs, reducing inference cost while preserving performance. Our Adaptable text Representations Compressor, named ARC-Encoder, achieves large efficiency gains

2

25

177

kyutai

@kyutai_labs

19 days

When combined, our TTS and STT can provide a voice to any text LLM, VLM, etc., allowing real-time voice interaction. Try that out in our demo: https://t.co/CpQTspIv0Q Paper: https://t.co/ZvsEElbAmE Models: https://t.co/KjDMx7Kgdd TTS evaluation:

github.com

Contribute to kyutai-labs/tts_longeval development by creating an account on GitHub.

0

3

17

kyutai

@kyutai_labs

19 days

We also train a model on a mix of publicly available datasets, to show that the improvements come from the architecture and not just proprietary data.

1

0

5

kyutai

@kyutai_labs

19 days

We train decoder-only models on time-aligned text and audio data, delaying the “output stream” so that it can be predicted from the “input stream”. At inference, we teacher-force the input stream and predict the output stream. This way, we can build both TTS and STT.

1

0

5

kyutai

@kyutai_labs

19 days

The throughput gains come from the fact that DSMs are easy to batch. The same is true for Kyutai Speech-To-Text, which outperforms Whisper-Streaming in throughput by two orders of magnitude.

1

5

kyutai

@kyutai_labs

19 days

We’ve released a preprint on Delayed Streams Modeling (DSM), the framework behind our open, streaming text-to-speech and speech-to-text. Kyutai TTS, powered by DSM, is blazingly fast and competitive with SotA models in quality while providing the best voice cloning. 🧵

1

14

93

Anne Wu

@anne_youw

3 months

🗣️We can listen and speak simultaneously when we talk, and so should the spoken dialogue models (SDMs)! 💬Unlike typical "walkie-talkie" voice AIs, full-duplex SDMs let both sides talk at once - more like real, natural conversation. But this makes alignment harder: - No

1

12

66

kyutai

@kyutai_labs

3 months

If you're at #ICML2025 this week, come check out these 3 posters from our lab🟢! - Aligning Spoken Dialog Models from User Interactions, @anne_youw Thu 17 Jul 11am-1:30pm W-316 - High-Fidelity Simultaneous Speech-To-Speech Translation, @tom_labiausse Wed 16 Jul 4:30pm - 7pm

0

3

29

Tom Labiausse

@tom_labiausse

3 months

I’m happy to share that I’ll be attending ICML 2025 in Vancouver next week to present 𝐇𝐢𝐛𝐢𝐤𝐢 [ https://t.co/q50LBOvM1h] 🇫🇷🇬🇧 — Kyutai’s real-time and expressive speech translation system. I'll be presenting the poster on Wednesday, July 16 at 4:30PM, feel free to stop by! 💬

2

9

61

kyutai

@kyutai_labs

4 months

Kyutai TTS comes with hundreds of voices based on Expresso and VCTK. If you would like to see more voices, help us by donating your voice at

2

41

kyutai

@kyutai_labs

4 months

Unmute turns a text LLM into a voice AI. At https://t.co/CpQTspIv0Q, it’s @MistralAI's Mistral-Small-3.2-24B, making it fully open-source. Play a quiz game with a snarky host, catch up on tech news, or just hang out and talk. Or modify it to do anything you want!

9

28

155

kyutai

@kyutai_labs

4 months

We’re also releasing the code for https://t.co/CpQTspIv0Q, the modular voice AI system. Make your own personal assistant, make it role-play, give a voice to your agent, or connect it to external tools. You can make it fit onto a single GPU. https://t.co/TgNPIBkuDD

2

7

76

kyutai

@kyutai_labs

4 months

Kyutai TTS and Unmute are now open source! The text-to-speech is natural, customizable, and fast: it can serve 32 users with a 350ms latency on a single L40S. Try it out and get started on the project page: https://t.co/B4P9FuOrQc

51

178

1K

kyutai

@kyutai_labs

4 months

Available in PyTorch, MLX, on your iPhone, or in Rust for your server needs! Project Page: https://t.co/bQMP56XIAa OpenASR Leaderboard:

huggingface.co

3

4

35

kyutai

@kyutai_labs

4 months

Our latest open-source speech-to-text model just claimed 1st place among streaming models and 5th place overall on the OpenASR leaderboard 🥇🎙️ While all other models need the whole audio, ours delivers top-tier accuracy on streaming content. Open, fast, and ready for production!

17

41

368

kyutai

@kyutai_labs

4 months

The other model is a lightweight English/French 1B model optimized for real-time voice chat apps like https://t.co/CpQTspHXbi. It comes with a semantic voice activity detector that predicts if you’re done talking or just pausing mid-sentence. The open-source releases of Kyutai

2

5

67

kyutai

@kyutai_labs

4 months

Today we are releasing two models. The first one is a 2.6B English-only model that beats Whisper Large v3 on benchmarks even though it’s a streaming model that doesn’t process all the audio at once. It can process 400 sequences in parallel on a single H100.

2

6

51

kyutai

@kyutai_labs

4 months

Kyutai Speech-To-Text is now open-source! It’s streaming, supports batched inference, and runs blazingly fast: perfect for interactive applications. Check out the details here: https://t.co/bQMP56XaKC