Alan Cowen
@AlanCowen
Followers
3K
Following
6K
Media
97
Statuses
1K
CEO @hume_ai, teaching AI to make people happy. AI researcher + emotion scientist
Joined February 2015
text-to-speech at LLM scale is basically a simulation of human cognition and emotion
Today, we’re releasing Octave: the first LLM built for text-to-speech. 🎨Design any voice with a prompt 🎬 Give acting instructions to control emotion and delivery (sarcasm, whispering, etc.) 🛠️Produce long-form content on our Creator Studio Unlike traditional TTS that just
2
4
32
One performance, infinite voices. Voice Conversion is now live on Hume’s creator studio and API! Generate the same pacing, pronunciation, and intonation with one recording across any voice you choose. Hear it for yourself ⬇️
19
33
162
Excited to be powering @NianticSpatial ‘s Dot
Today, @NianticSpatial released an update to their AR companion, Dot, at Snap's Lens Fest, with new voice capabilities powered by Hume AI. Dot's new interactive dialogue capabilities allow the AI companion to guide users through physical spaces, offering contextual information
1
2
13
we've been working on making voice AI faster and more realistic. we're hoping that scaling voice will give AI compatibility with the human psyche: every voice session is time-locked, interruptible, and rife with feedback. but first, we need higher quality voice experiences.
Introducing Octave 2: our next-generation multilingual text-to-speech model What’s new: - Fluent in 11+ languages - 40% faster (<200ms latency) & 50% cheaper than Octave 1 - Multi-speaker conversation - More reliable pronunciation - New voice conversion & phoneme editing
3
0
12
Today @hume_ai is releasing its latest text to audio and audio to text AI models. Remember this company? Brings emotions into your audio in a way others don't. But now better and here founder @AlanCowen gives me a deep dive into the audio/AI space and an update on their
12
9
104
Introducing Octave 2: our next-generation multilingual text-to-speech model What’s new: - Fluent in 11+ languages - 40% faster (<200ms latency) & 50% cheaper than Octave 1 - Multi-speaker conversation - More reliable pronunciation - New voice conversion & phoneme editing
84
168
2K
SambaNova 🤝 @Hume_AI Imagine if the world's most realistic voice AI was also super fast, ridiculously smart, and crazy affordable 🤯 Well, we made it happen... and you can see it (& hear it 😉) for yourself👇
1
10
36
Use OpenAI's new open source model with any voice, cloned or designed!
You can use Cerebras' gpt-oss-120b to build realistic speech-to-speech voice interfaces with emotion via @hume_ai's new EVI 3. Perfect for your next voice ai project! Link to try below 👇
3
7
53
2024: Voice Cloning 2025: What about personality cloning? Hume’s voice AI can now not only mimic your voice but also speaking style and language. It’s now available via our TTS and new speech-to-speech model, EVI 3, which is also launching today.
218
296
2K
We've rolled out significant improvements to the @hume_ai AI TTS integration in Vapi. The latest update delivers: • 66% lower latency • Reduced cost per min by 41% • More efficient voice interactions This aligns with our ongoing effort to enhance speed, reliability, and
2
5
30
Just launched TTS Arena V2! 🎙️ Blind A/B voting for TTS models (open + closed) 🗣️ New: Conversational Arena (CSM 1B, Dia 1.6B, more) 📊 Personal Leaderboard to track your favorites (optional login) ⚡ Rebuilt from scratch - faster + keyboard shortcuts
5
26
221
.@hume_ai's @gagnecr13 & @inafried test out Hume AI's EVI 3 AI model that's optimized to understand and express human emotions at #AxiosAISummit
5
10
32
Emotive voice AI startup Hume launches new EVI 3 model with rapid custom voice creation
2
3
13
We think everyone should have a unique, trusted AI they recognize by voice
Meet EVI 3, another step toward general voice intelligence. EVI 3 is a speech-language model that can understand and generate any human voice, not just a handful of speakers. With this broader voice intelligence comes greater expressiveness and a deeper understanding of tune,
2
3
17
Meet EVI 3, another step toward general voice intelligence. EVI 3 is a speech-language model that can understand and generate any human voice, not just a handful of speakers. With this broader voice intelligence comes greater expressiveness and a deeper understanding of tune,
53
110
583
What’s the interface for the future of AI? It isn’t a chatbot. It needs to be: 1. context-rich (sees what you’re seeing) 2. passively helpful (e.g., gives directions when you need them) 3. unobtrusive (casual, quiet, and minimally distracting) Maybe something like this?
2
2
11
Good thing Hume’s Octave TTS is more emotionally intelligent than… this unhinged therapist
8
20
184
Octave TTS is now 𝑓𝑎𝑠𝑡 Introducing Octave Instant Mode. Now, the highest quality TTS on the market runs in under 250ms… 4x faster than the second best model... while retaining all the nuance, emotion, and personality that sets it apart.
7
19
144