kyutai @kyutai_labs tweet - Talk to https://t.co/CpQTspHXbi 🔊, the most modular voice AI around. Empower any text LLM with voice, instantly, by wrapping it with our new speech-to-text and text-to-speech. Any personality, any voice. Interruptible, smart turn-taking. We’ll open-source everything within the https://t.co/ZatL9tS1OG

kyutai

@kyutai_labs

5 months

“But what about Moshi?” Last year we unveiled Moshi, the first audio-native model. While Moshi provides unmatched latency and naturalness, it doesn’t yet match the extended abilities of text models such as function-calling, stronger reasoning capabilities, and in-context

1

56

kyutai

@kyutai_labs

5 months

Unmute’s speech-to-text is streaming, accurate, and includes a semantic VAD that predicts whether you’ve actually finished speaking or if you’re just pausing mid-sentence, meaning it’s low-latency but doesn’t interrupt you.

2

72

kyutai

@kyutai_labs

5 months

The text LLM’s response is passed to our TTS, conditioned on a 10s voice sample. We’ll provide access to the voice cloning model in a controlled way. The TTS is also streaming *in text*, reducing the latency by starting to speak even before the full text response is generated.

6

2

91

kyutai

@kyutai_labs

5 months

What’s next? We strongly believe that the future of human-machine interaction lies in natural, full-duplex speech interactions, coupled with customization and extended abilities. Stay tuned for what’s to come!

4

2

81

David

@DavidSHolz

5 months

@kyutai_labs super cool

1

0

22

Agent Joshua ₱

@hashwarlock

5 months

@kyutai_labs I asked, "If you were your creator and you wanted to build an agent that could hit boundaries & grow instead of stop at your limit, what would you design, and what unproven ideas would you try to attempt?" Impressed with the conversation flow & depth of their knowledge. A

1

2

12

Linus ✦ Ekenstam

@LinusEkenstam

5 months

@kyutai_labs I like this evolution of Moshi. I want it. because all the frontier models advanced voice modes are lobotomized.

0

4

Kol Tregaskes

@koltregaskes

5 months

@kyutai_labs I love it. 😀

0

1

Damien C. Tanner

@dctanner

5 months

@kyutai_labs This looks awesome. We’d love to add support for these models to @uselayercode

0

1

asim ᯅ

@tweetsfromasim

5 months

@kyutai_labs super impressive!

0

1

Umesh

@umesh_ai

5 months

@kyutai_labs So cool!

0

Josh Whiton

@joshwhiton

4 months

@kyutai_labs So excited for this, you're the only account I have notifications turned on for. But please try to find a way for it to handle silence. No matter what I say, "Alright take all the time you need." Is immediately followed by "Are you there?" etc. Without end.

0

Josh Whiton

@joshwhiton

5 months

@kyutai_labs Great start, very useful. But it needs to be able to handle silence. "Don't talk, I'm thinking" never results in more than a few seconds of peace before it interjects.

0

Andrew Hart

@AndrewHartAR

5 months

@kyutai_labs Had a lot of fun playing the soulless quiz.

0

Alpaca Network

@AlpacaNetworkAI

5 months

@kyutai_labs Love this direction — voice is such a natural interface for agents! Open-sourcing is even better. Next step? Owning the models behind them. 🧠

0

The Canaanite

@mysticaltech

5 months

@kyutai_labs Amazing work

0

Shiraz Akmal

@ShirazAkmal

5 months

@kyutai_labs Nice!

0

ManyMangoes Pty Ltd

@somanymangoes

4 months

@kyutai_labs Voice AI just leveled up.

0

julien lesaicherre 🇺🇦

@jlesaicherre

5 months

@kyutai_labs Go team!

0

BowtiedWhitebat + Read Pinned Tweet or NGMI

@bowtiedwhitebat

5 months

@kyutai_labs that + this

0

BowtiedWhitebat + Read Pinned Tweet or NGMI

@bowtiedwhitebat

5 months

@kyutai_labs can we make ai jesus talking nonstop?

0

Rodri Mora aka Bullerwins

@rodrimora

5 months

@kyutai_labs Mandatory "Her" voice test

0

20

Rodri Mora aka Bullerwins

@rodrimora

5 months

@kyutai_labs What I found most interesting is the VAD, it works well, pausing and responding appropriately. Any plans to open-sourcing' it?

0

5

ZAZA

@OpinionAILtd

5 months

@kyutai_labs Dear team any ETA on the code

0

2

Tom

@ThomasCsere

5 months

@kyutai_labs Very cool, can't wait to try it. What's the preferred hardware to run this for each model?

0

2

TigerHix

@TigerHixTang

5 months

@kyutai_labs Very cool work, but the AI voice often stops abruptly when speaking. I was testing the "Dev (News)" option. An implementation error of the cascaded system or it's a limitation of the TTS?

1

0

2

Viktor Andreas

@elyonviktor

5 months

@kyutai_labs Fantastic, exactly what was missing in the ecosystem!

0

2

Amigoz

@ashishblessings

5 months

@kyutai_labs This is absolutely brilliant. I have been trying Gemini Live and GPT-realtime but they are too costly and voice is not natural enough for casual talks. How big are these models. Will you also release docs for how to selfhost?

1

0

2

Quantum Daddy

@LegalPrimes

5 months

@kyutai_labs Exciting stuff!

0

1

stephen 🌿

@stevelizcano

5 months

@kyutai_labs @super_bavario

0

1

Unity Eagle

@UnityEagle

5 months

@kyutai_labs Just gave it a try and I’m impressed 🤩

0

1

Brian

@bf6x0

5 months

@kyutai_labs how does it perform inside a moving car? had a long drive a couple weeks ago and tried chatgpt voice mode to pass the time, but the road noise kept interrupting it and making it re-start its responses. can you filter for just voices?

0

1

KC

@kcwolfy_

5 months

@kyutai_labs Great job @kyutai_labs team. By far the most natural-feeling conversation I have had with an AI to date. 👏

1

0

1

MetaMike

@metx_mike

5 months

@kyutai_labs Is there an api available? I didn't see much info but maybe I missed it

0

1

Simon

@AI_Homelab

5 months

@kyutai_labs @JagersbergKnut Woah! 😃 Looking forward to it! =)

0

1

toughyear

@toughyear

5 months

@kyutai_labs pretty awesome work. the mobile UI is slightly broken but otherwise pretty good.

1

0

1

Replies