@kyutai_labs
kyutai
5 months
Talk to https://t.co/CpQTspHXbi 🔊, the most modular voice AI around. Empower any text LLM with voice, instantly, by wrapping it with our new speech-to-text and text-to-speech. Any personality, any voice. Interruptible, smart turn-taking. We’ll open-source everything within the
117
264
2K

Replies

@kyutai_labs
kyutai
5 months
“But what about Moshi?” Last year we unveiled Moshi, the first audio-native model. While Moshi provides unmatched latency and naturalness, it doesn’t yet match the extended abilities of text models such as function-calling, stronger reasoning capabilities, and in-context
1
1
56
@kyutai_labs
kyutai
5 months
Unmute’s speech-to-text is streaming, accurate, and includes a semantic VAD that predicts whether you’ve actually finished speaking or if you’re just pausing mid-sentence, meaning it’s low-latency but doesn’t interrupt you.
2
2
72
@kyutai_labs
kyutai
5 months
The text LLM’s response is passed to our TTS, conditioned on a 10s voice sample. We’ll provide access to the voice cloning model in a controlled way. The TTS is also streaming *in text*, reducing the latency by starting to speak even before the full text response is generated.
6
2
91
@kyutai_labs
kyutai
5 months
What’s next? We strongly believe that the future of human-machine interaction lies in natural, full-duplex speech interactions, coupled with customization and extended abilities. Stay tuned for what’s to come!
4
2
81
@DavidSHolz
David
5 months
@kyutai_labs super cool
1
0
22
@hashwarlock
Agent Joshua ₱
5 months
@kyutai_labs I asked, "If you were your creator and you wanted to build an agent that could hit boundaries & grow instead of stop at your limit, what would you design, and what unproven ideas would you try to attempt?" Impressed with the conversation flow & depth of their knowledge. A
1
2
12
@LinusEkenstam
Linus ✦ Ekenstam
5 months
@kyutai_labs I like this evolution of Moshi. I want it. because all the frontier models advanced voice modes are lobotomized.
0
0
4
@koltregaskes
Kol Tregaskes
5 months
@kyutai_labs I love it. 😀
0
0
1
@dctanner
Damien C. Tanner
5 months
@kyutai_labs This looks awesome. We’d love to add support for these models to @uselayercode
0
0
1
@tweetsfromasim
asim ᯅ
5 months
@kyutai_labs super impressive!
0
0
1
@umesh_ai
Umesh
5 months
@kyutai_labs So cool!
0
0
0
@joshwhiton
Josh Whiton
4 months
@kyutai_labs So excited for this, you're the only account I have notifications turned on for. But please try to find a way for it to handle silence. No matter what I say, "Alright take all the time you need." Is immediately followed by "Are you there?" etc. Without end.
0
0
0
@joshwhiton
Josh Whiton
5 months
@kyutai_labs Great start, very useful. But it needs to be able to handle silence. "Don't talk, I'm thinking" never results in more than a few seconds of peace before it interjects.
0
0
0
@AndrewHartAR
Andrew Hart
5 months
@kyutai_labs Had a lot of fun playing the soulless quiz.
0
0
0
@AlpacaNetworkAI
Alpaca Network
5 months
@kyutai_labs Love this direction — voice is such a natural interface for agents! Open-sourcing is even better. Next step? Owning the models behind them. 🧠
0
0
0
@mysticaltech
The Canaanite
5 months
@kyutai_labs Amazing work
0
0
0
@ShirazAkmal
Shiraz Akmal
5 months
@kyutai_labs Nice!
0
0
0
@somanymangoes
ManyMangoes Pty Ltd
4 months
@kyutai_labs Voice AI just leveled up.
0
0
0
@jlesaicherre
julien lesaicherre 🇺🇦
5 months
@kyutai_labs Go team!
0
0
0
@bowtiedwhitebat
BowtiedWhitebat + Read Pinned Tweet or NGMI
5 months
@kyutai_labs that + this
0
0
0
@bowtiedwhitebat
BowtiedWhitebat + Read Pinned Tweet or NGMI
5 months
@kyutai_labs can we make ai jesus talking nonstop?
0
0
0
@rodrimora
Rodri Mora aka Bullerwins
5 months
@kyutai_labs Mandatory "Her" voice test
0
0
20
@rodrimora
Rodri Mora aka Bullerwins
5 months
@kyutai_labs What I found most interesting is the VAD, it works well, pausing and responding appropriately. Any plans to open-sourcing' it?
0
0
5
@OpinionAILtd
ZAZA
5 months
@kyutai_labs Dear team any ETA on the code
0
0
2
@ThomasCsere
Tom
5 months
@kyutai_labs Very cool, can't wait to try it. What's the preferred hardware to run this for each model?
0
0
2
@TigerHixTang
TigerHix
5 months
@kyutai_labs Very cool work, but the AI voice often stops abruptly when speaking. I was testing the "Dev (News)" option. An implementation error of the cascaded system or it's a limitation of the TTS?
1
0
2
@elyonviktor
Viktor Andreas
5 months
@kyutai_labs Fantastic, exactly what was missing in the ecosystem!
0
0
2
@ashishblessings
Amigoz
5 months
@kyutai_labs This is absolutely brilliant. I have been trying Gemini Live and GPT-realtime but they are too costly and voice is not natural enough for casual talks. How big are these models. Will you also release docs for how to selfhost?
1
0
2
@LegalPrimes
Quantum Daddy
5 months
@kyutai_labs Exciting stuff!
0
0
1
@stevelizcano
stephen 🌿
5 months
0
0
1
@UnityEagle
Unity Eagle
5 months
@kyutai_labs Just gave it a try and I’m impressed 🤩
0
0
1
@bf6x0
Brian
5 months
@kyutai_labs how does it perform inside a moving car? had a long drive a couple weeks ago and tried chatgpt voice mode to pass the time, but the road noise kept interrupting it and making it re-start its responses. can you filter for just voices?
0
0
1
@kcwolfy_
KC
5 months
@kyutai_labs Great job @kyutai_labs team. By far the most natural-feeling conversation I have had with an AI to date. 👏
1
0
1
@metx_mike
MetaMike
5 months
@kyutai_labs Is there an api available? I didn't see much info but maybe I missed it
0
0
1
@AI_Homelab
Simon
5 months
@kyutai_labs @JagersbergKnut Woah! 😃 Looking forward to it! =)
0
0
1
@toughyear
toughyear
5 months
@kyutai_labs pretty awesome work. the mobile UI is slightly broken but otherwise pretty good.
1
0
1