Neil Zeghidour Profile
Neil Zeghidour

@neilzegh

Followers
4K
Following
2K
Media
47
Statuses
472

Founder @kyutai_labs working on audio generation. Co-invented SoundStream, AudioLM, MusicLM, Moshi. Previously @GoogleDeepMind and @metaai.

Paris
Joined January 2020
Don't wanna be here? Send us removal request.
@neilzegh
Neil Zeghidour
8 months
Today we release Hibiki, real-time speech translation that runs on your phone. Adaptive flow without fancy policy, simple temperature sampling of a multistream audio-text LM. Very proud of @tom_labiausse 's work as an intern.
@kyutai_labs
kyutai
8 months
Meet Hibiki, our simultaneous speech-to-speech translation model, currently supporting 🇫🇷➡️🇬🇧. Hibiki produces spoken and text translations of the input speech in real-time, while preserving the speaker’s voice and optimally adapting its pace based on the semantic content of the
12
53
402
@kyutai_labs
kyutai
21 hours
🚀New models: ARC-Encoders We introduce a lightweight encoder that compresses context into continuous representations for LLMs, reducing inference cost while preserving performance. Our Adaptable text Representations Compressor, named ARC-Encoder, achieves large efficiency gains
2
23
167
@gen_intuition
General Intuition
2 days
Introducing General Intuition and our $133.7M Seed from Khosla Ventures, General Catalyst, and Raine. We build foundation models and general agents for environments that require deep spatial and temporal reasoning.
106
97
2K
@vvolhejn
Václav Volhejn
10 days
Training a VQ-VAE on Fashion MNIST, for the explainer of neural audio codecs I'm working on at @kyutai_labs
0
4
34
@honualx
Alexandre Défossez
16 days
We released the pre-print describing the TTS and ASR behind our recent Unmute demo: https://t.co/TdfE4yw5lK 👨‍🔬🔬 We leverage the same architecture as in Moshi, with the text tokens being either late (ASR) or in advanced compared to audio tokens (TTS). 👇 @kyutai_labs @neilzegh
1
2
27
@SpiralDB
Spiral
1 month
We're building the data infrastructure that AI actually needs. Current systems were built for humans reading dashboards. But an H100 can consume 4 million images per second. The future isn't human-scale. It's machine-scale. Introducing Spiral: Data 3.0 🌀 1/8
13
38
386
@GoogleDeepMind
Google DeepMind
2 months
What if you could not only watch a generated video, but explore it too? 🌐 Genie 3 is our groundbreaking world model that creates interactive, playable environments from a single text prompt. From photorealistic landscapes to fantasy realms, the possibilities are endless. 🧵
835
3K
14K
@tom_labiausse
Tom Labiausse
3 months
I’m happy to share that I’ll be attending ICML 2025 in Vancouver next week to present 𝐇𝐢𝐛𝐢𝐤𝐢 [ https://t.co/q50LBOvM1h] 🇫🇷🇬🇧 — Kyutai’s real-time and expressive speech translation system. I'll be presenting the poster on Wednesday, July 16 at 4:30PM, feel free to stop by! 💬
2
9
61
@Thom_Wolf
Thomas Wolf
3 months
Thrilled to finally share what we've been working on for months at @huggingface 🤝@pollenrobotics Our first robot: Reachy Mini A dream come true: cute and low priced, hackable yet easy to use, powered by open-source and the infinite community. Tiny price, small size, huge
239
528
3K
@Thom_Wolf
Thomas Wolf
3 months
We’re releasing the top 3B model out there SOTA performances It has dual mode reasoning (with or without think) Extended long context up to 128k And it’s multilingual with strong support for en, fr, es, de, it, pt What do you need more? Oh yes we’re also open-sourcing all
15
83
478
@neilzegh
Neil Zeghidour
4 months
After releasing the best streaming STT, we’re releasing our state-of-the-art TTS and the code to run Unmute yourself and host your own custom voice AIs. Enjoy!
@kyutai_labs
kyutai
4 months
Kyutai TTS and Unmute are now open source! The text-to-speech is natural, customizable, and fast: it can serve 32 users with a 350ms latency on a single L40S. Try it out and get started on the project page: https://t.co/B4P9FuOrQc
3
4
51
@neilzegh
Neil Zeghidour
4 months
lmao just found out I trained a multilingual ASR without knowing it 😭 . This is most likely a side-effect of non-FR/EN examples slipping through our language filtering, and just makes me want to train a proper multilingual one with @n0mad_0. Thanks @TommyFalkowski!
@TommyFalkowski
Tommy Falkowski
4 months
It works! We can just inject a foreign language audio snippet at the beginning to make the kyutai stt 1B realtime model use languages other than English and French. I got it working for Spanish, German and Japanese! @kyutai_labs
5
3
36
@awnihannun
Awni Hannun
4 months
New speech-to-text model from @kyutai_labs The best streaming model on the HF leaderboard and it runs on your Mac or iPhone with MLX:
@kyutai_labs
kyutai
4 months
Our latest open-source speech-to-text model just claimed 1st place among streaming models and 5th place overall on the OpenASR leaderboard 🥇🎙️ While all other models need the whole audio, ours delivers top-tier accuracy on streaming content. Open, fast, and ready for production!
1
10
117
@n0mad_0
ëugene kharitonov 🏴‍☠️
4 months
A quick second of fame: Kyutai-STT gets into top-5 of OpenASR leaderboard --- and unlike others, our model works in a streaming regime and starts transcribing before the entire audio is available!
@kyutai_labs
kyutai
4 months
Kyutai Speech-To-Text is now open-source! It’s streaming, supports batched inference, and runs blazingly fast: perfect for interactive applications. Check out the details here: https://t.co/bQMP56XaKC
2
9
24
@neilzegh
Neil Zeghidour
4 months
My latest work in ASR was with waveform-based CNNs back in 2018. Got back to it to ship yet another uniquely capable model based on Moshi's framework and powering https://t.co/QF9VcENGxk ! Batched, streaming inference (you typically just get 1 of the 2) is the game changer here.
@kyutai_labs
kyutai
4 months
Kyutai Speech-To-Text is now open-source! It’s streaming, supports batched inference, and runs blazingly fast: perfect for interactive applications. Check out the details here: https://t.co/bQMP56XaKC
2
6
46
@laurentsifre
Laurent Sifre
5 months
🤗We're open-sourcing Holo1 model weights & the WebClick dataset to accelerate agentic AI research! Find them on our HuggingFace page
Tweet card summary image
huggingface.co
1
6
24
@neilzegh
Neil Zeghidour
5 months
There is so much more to voice interaction than assistants. Get creative with https://t.co/QF9VcEN8HM.
@kyutai_labs
kyutai
5 months
Using Unmute with a custom voice and prompt to create a very intense ice cream seller, inspired by Justin Kuritzkes' sketch🍦
0
2
13
@neilzegh
Neil Zeghidour
5 months
Unmute is our new cascaded voice assistant: fast, accurate, and flexible. It doesn't have the full-duplex and zero latency of Moshi, but you can change the voice with a 10s sample and plug any LLM. A good playground for testing custom voice AIs.
@kyutai_labs
kyutai
5 months
Talk to https://t.co/CpQTspHXbi 🔊, the most modular voice AI around. Empower any text LLM with voice, instantly, by wrapping it with our new speech-to-text and text-to-speech. Any personality, any voice. Interruptible, smart turn-taking. We’ll open-source everything within the
2
9
66
@neilzegh
Neil Zeghidour
6 months
We are releasing a multilingual 2B model focused on UE languages, but maybe more importantly we're making our pretraining data pipeline public. Just run `uv run dactory create /path/to/dest` and get Helium's pretraining data.
@kyutai_labs
kyutai
6 months
🚀 Thrilled to announce Helium 1, our new 2B-parameter LLM, now available alongside dactory, an open-source pipeline to reproduce its training dataset covering all 24 EU official languages. Helium sets new standards within its size class on European languages!
0
12
97
@lostanlen
Vincent Lostanlen
6 months
Our team is hiring a postdoc in Audio AI! What: speech, music, bioacoustics How: multiresolution neural networks in the raw waveform Where: Nantes, France ( https://t.co/tbNcztQDWh) When: negotiable How long: 12 months, renewable Apply before May 10: https://t.co/YuRvlVL4SF
1
19
53
@neilzegh
Neil Zeghidour
6 months
Thanks @GoogleAI 🙏, I'm proud to see concepts introduced in this paper (RVQ-VAE, quantizer dropout) being still as relevant four years later, and in particular how the RVQ turned out to be a perfect fit for audio language models.
@GoogleAI
Google AI
6 months
Congratulations to Neil Zeghidour, Alejandro Luebs, Ahmed Omran, Jan Skoglund, and Marco Tagliasacchi for winning the IEEE Best Paper Award for "SoundStream: An End-to-End Natural Audio Codec"! https://t.co/5C9HuwYFjJ #SPSAwards #IEEEAwards
3
13
184