
Patrick Pérez
@ptrkprz
Followers
702
Following
49
Media
0
Statuses
38
AI & CV scientist, CEO at @kyutai_labs
Paris
Joined December 2023
As promised, we are sharing the technology behind Moshi: paper+models+inference code for everyone.
Today, we release several Moshi artifacts: a long technical report with all the details behind our model, weights for Moshi and its Mimi codec, along with streaming inference code in Pytorch, Rust and MLX. More details below 🧵 ⬇️ Paper: https://t.co/JQtEMppifK Repo:
3
10
112
New sharing step on our journey towards easy-to-use fully-open models.
Meet Helium-1 preview, our 2B multi-lingual LLM, targeting edge and mobile devices, released under a CC-BY license. Start building with it today! https://t.co/X4Dbx2T1cJ
1
2
14
I’ll be presenting a deep dive into how Moshi works at the next NLP Meetup in Paris, this Wednesday the 9th at 7pm. Register if you want to attend ! 🧩🔎🟢 https://t.co/1ZPb105JKX
meetup.com
📍8 rue Cambacérès, 75008 Paris 📆 October 9th, 7:00 p.m. ⚠️ **Limited spots available.** Be sure to reserve your place in advance! **👥 Alexandre Défossez - Chief Explora
5
10
72
Serious stress testing!
Voice AIs handle speaker turns & interruptions with Voice Activity Detection. VAD is brittle and will trigger due to background noise, creating frequent hiccups. Moshi gets rid of it completely, so you can use it in the most chaotic settings. I myself couldn't hear Moshi here 😅
0
0
2
Moshi is a very nice/fun conversational AI audio 🔊 model release from @kyutai_labs . Are you slowly losing faith in the objective reality and existence of Advanced Voice Mode? Talk to Moshi instead :) You can talk to it on their website: https://t.co/OQpIaXx8wL Or even locally
Today, we release several Moshi artifacts: a long technical report with all the details behind our model, weights for Moshi and its Mimi codec, along with streaming inference code in Pytorch, Rust and MLX. More details below 🧵 ⬇️ Paper: https://t.co/JQtEMppifK Repo:
71
321
3K
0
4
9
can even be explored on a vacation beach or a conference center, as Moshi is robust to noisy environments
"Hippie" Moshi tells its love for Hendrix...but "skeptical" Moshi is less enthusiastic about psychedelic rock. Moshi can play 70+ emotions, will you catch them all? Try now at https://t.co/lU2sqa8wMQ
0
0
4
Staying in real-time connection with voice AI in Paris while being in Vienna
0
0
2
The attentive listener will notice that even when speaking over Alex, Moshi still listens (taking into account the "in space" instruction for the second poem)
Some Moshi extracts! Get your own at https://t.co/SVQZQ9UlEN Don't forget to click the "Download video" at the end (if it's good) 🟢
2
1
11
And our demo runs in the US thanks to a donation from @huggingface
Thanks @Thom_Wolf Moshi experimental voice AI is indeed a crazy adventure / a radical innovation / a new technology / a surprising experience / a research prototype / a shared resource / a starting point…. not a productized conversational bot.
0
0
5
Thanks @Thom_Wolf Moshi experimental voice AI is indeed a crazy adventure / a radical innovation / a new technology / a surprising experience / a research prototype / a shared resource / a starting point…. not a productized conversational bot.
The @kyutai_labs fully end-to-end audio model demo of today is a huge deal that many people missed in the room Mostly irrelevant are the facts that: - they come a few week after OpenAI ChatGPT-4o - the demo was less polished than the 4o one (in terms of voice quality, voice
0
1
9
Research internships at @kyutai_labs are fun, beside the hard work! A good session by @RamaAdrien
Moshi is not an assistant, but rather a prototype for advancing real-time interaction with machines. It can chit-chat, discuss facts and make recommendations, but a more groundbreaking ability is its expressivity and spontaneity that allow for engaging into fun roleplay.
0
2
13
It feels so good to have shared at last what we have been up to in the past 6 months. We worked hard on this unique voice AI, carefully training it on a mix of text and speech, making it multi-stream and real-time, and putting it in an online demo for everyone to experience it.
Yesterday we introduced Moshi, the lowest latency conversational AI ever released. Moshi can perform small talk, explain various concepts, engage in roleplay in many emotions and speaking styles. Talk to Moshi here https://t.co/a4EbAQiih7 and learn more about the method below 🧵.
4
5
55
Please @abursuc keep one for me!
We've just launched our BRAVO robustness and reliability challenge for semantic segmentation. I and @tuan_hung_vu will be giving away these nice stickers @CVPR Ping us or catch us at the posters to find out more! #CVPR2024
0
0
5
We are releasing 4M-21 with a permissive license, including its source code and trained models. It's a pretty effective multimodal model that solves 10s of tasks & modalities. See the demo code, sample results, and the tokenizers of diverse modalities on the website. IMO, the
We are releasing the 1st version of 4M, a framework for training multimodal foundation models across tens of modalities & tasks, based on scalable masked modeling. Joint effort by @EPFL_en & @Apple. 4M: Massively Multimodal Masked Modeling 🌐 https://t.co/usE17pnXf9 🧵1/n
6
95
345
📢We introduce the ScaLR models (code+checkpoints) for LiDAR perception distilled from vision foundation models tl;dr: don’t neglect the choice of teacher, student, and pretraining datasets -> their impact is probably more important than the distillation method #CVPR2024 🧵 [1/8]
1
12
32
we’ve got multiple PhD and postdoc positions funded by my #ERCstg project ENSURE. if you’re interested in computer vision and self-driving, please consider applying. graduate students: apply ASAP! details at https://t.co/LmhaOEeXuL postdocs: send me an email with your CV and
7
28
106
1/ Today the UK's AI Safety Institute is open sourcing our safety evaluations platform. We call it "Inspect":
gov.uk
The AI Safety Institute has open released a new testing platform to strengthen AI safety evaluations.
7
80
291