ptrkprz Profile Banner
Patrick Pérez Profile
Patrick Pérez

@ptrkprz

Followers
702
Following
49
Media
0
Statuses
38

AI & CV scientist, CEO at @kyutai_labs

Paris
Joined December 2023
Don't wanna be here? Send us removal request.
@ptrkprz
Patrick Pérez
1 year
As promised, we are sharing the technology behind Moshi: paper+models+inference code for everyone.
@kyutai_labs
kyutai
1 year
Today, we release several Moshi artifacts: a long technical report with all the details behind our model, weights for Moshi and its Mimi codec, along with streaming inference code in Pytorch, Rust and MLX. More details below 🧵 ⬇️ Paper: https://t.co/JQtEMppifK Repo:
3
10
112
@ptrkprz
Patrick Pérez
9 months
changing air, entering blue sky, same handle
0
0
3
@ptrkprz
Patrick Pérez
9 months
New sharing step on our journey towards easy-to-use fully-open models.
@kyutai_labs
kyutai
9 months
Meet Helium-1 preview, our 2B multi-lingual LLM, targeting edge and mobile devices, released under a CC-BY license. Start building with it today! https://t.co/X4Dbx2T1cJ
1
2
14
@honualx
Alexandre Défossez
1 year
I’ll be presenting a deep dive into how Moshi works at the next NLP Meetup in Paris, this Wednesday the 9th at 7pm. Register if you want to attend ! 🧩🔎🟢 https://t.co/1ZPb105JKX
Tweet card summary image
meetup.com
📍8 rue Cambacérès, 75008 Paris 📆 October 9th, 7:00 p.m. ⚠️ **Limited spots available.** Be sure to reserve your place in advance! **👥 Alexandre Défossez - Chief Explora
5
10
72
@ptrkprz
Patrick Pérez
1 year
Serious stress testing!
@neilzegh
Neil Zeghidour
1 year
Voice AIs handle speaker turns & interruptions with Voice Activity Detection. VAD is brittle and will trigger due to background noise, creating frequent hiccups. Moshi gets rid of it completely, so you can use it in the most chaotic settings. I myself couldn't hear Moshi here 😅
0
0
2
@karpathy
Andrej Karpathy
1 year
Moshi is a very nice/fun conversational AI audio 🔊 model release from @kyutai_labs . Are you slowly losing faith in the objective reality and existence of Advanced Voice Mode? Talk to Moshi instead :) You can talk to it on their website: https://t.co/OQpIaXx8wL Or even locally
@kyutai_labs
kyutai
1 year
Today, we release several Moshi artifacts: a long technical report with all the details behind our model, weights for Moshi and its Mimi codec, along with streaming inference code in Pytorch, Rust and MLX. More details below 🧵 ⬇️ Paper: https://t.co/JQtEMppifK Repo:
71
321
3K
@hertzfelt_io
MΔXIMUS HΞRTŻFΞLT
1 year
0
4
9
@ptrkprz
Patrick Pérez
1 year
can even be explored on a vacation beach or a conference center, as Moshi is robust to noisy environments
@kyutai_labs
kyutai
1 year
"Hippie" Moshi tells its love for Hendrix...but "skeptical" Moshi is less enthusiastic about psychedelic rock. Moshi can play 70+ emotions, will you catch them all? Try now at https://t.co/lU2sqa8wMQ
0
0
4
@ptrkprz
Patrick Pérez
1 year
Meet our ambassador!
@kyutai_labs
kyutai
1 year
If you're attending ICML and want to learn more about Kyutai and Moshi, reach out to Edouard!
0
0
1
@ptrkprz
Patrick Pérez
1 year
Staying in real-time connection with voice AI in Paris while being in Vienna
@EXGRV
Edouard Grave
1 year
Moshi goes to #ICML2024 in Vienna! Try the demo at https://t.co/weFG6cmhDT
0
0
2
@ptrkprz
Patrick Pérez
1 year
The attentive listener will notice that even when speaking over Alex, Moshi still listens (taking into account the "in space" instruction for the second poem)
@honualx
Alexandre Défossez
1 year
Some Moshi extracts! Get your own at https://t.co/SVQZQ9UlEN Don't forget to click the "Download video" at the end (if it's good) 🟢
2
1
11
@ptrkprz
Patrick Pérez
1 year
And our demo runs in the US thanks to a donation from @huggingface
@ptrkprz
Patrick Pérez
1 year
Thanks @Thom_Wolf Moshi experimental voice AI is indeed a crazy adventure / a radical innovation / a new technology / a surprising experience / a research prototype / a shared resource / a starting point…. not a productized conversational bot.
0
0
5
@ptrkprz
Patrick Pérez
1 year
Thanks @Thom_Wolf Moshi experimental voice AI is indeed a crazy adventure / a radical innovation / a new technology / a surprising experience / a research prototype / a shared resource / a starting point…. not a productized conversational bot.
@Thom_Wolf
Thomas Wolf
1 year
The @kyutai_labs fully end-to-end audio model demo of today is a huge deal that many people missed in the room Mostly irrelevant are the facts that: - they come a few week after OpenAI ChatGPT-4o - the demo was less polished than the 4o one (in terms of voice quality, voice
0
1
9
@ptrkprz
Patrick Pérez
1 year
Research internships at @kyutai_labs are fun, beside the hard work! A good session by @RamaAdrien
@kyutai_labs
kyutai
1 year
Moshi is not an assistant, but rather a prototype for advancing real-time interaction with machines. It can chit-chat, discuss facts and make recommendations, but a more groundbreaking ability is its expressivity and spontaneity that allow for engaging into fun roleplay.
0
2
13
@ptrkprz
Patrick Pérez
1 year
It feels so good to have shared at last what we have been up to in the past 6 months. We worked hard on this unique voice AI, carefully training it on a mix of text and speech, making it multi-stream and real-time, and putting it in an online demo for everyone to experience it.
@kyutai_labs
kyutai
1 year
Yesterday we introduced Moshi, the lowest latency conversational AI ever released. Moshi can perform small talk, explain various concepts, engage in roleplay in many emotions and speaking styles. Talk to Moshi here https://t.co/a4EbAQiih7 and learn more about the method below 🧵.
4
5
55
@ptrkprz
Patrick Pérez
1 year
Please @abursuc keep one for me!
@abursuc
Andrei Bursuc
1 year
We've just launched our BRAVO robustness and reliability challenge for semantic segmentation. I and @tuan_hung_vu will be giving away these nice stickers @CVPR Ping us or catch us at the posters to find out more! #CVPR2024
0
0
5
@zamir_ar
Amir Zamir
1 year
We are releasing 4M-21 with a permissive license, including its source code and trained models. It's a pretty effective multimodal model that solves 10s of tasks & modalities. See the demo code, sample results, and the tokenizers of diverse modalities on the website. IMO, the
@zamir_ar
Amir Zamir
2 years
We are releasing the 1st version of 4M, a framework for training multimodal foundation models across tens of modalities & tasks, based on scalable masked modeling. Joint effort by @EPFL_en & @Apple. 4M: Massively Multimodal Masked Modeling 🌐 https://t.co/usE17pnXf9 🧵1/n
6
95
345
@valeoai
valeo.ai
1 year
📢We introduce the ScaLR models (code+checkpoints) for LiDAR perception distilled from vision foundation models tl;dr: don’t neglect the choice of teacher, student, and pretraining datasets -> their impact is probably more important than the distillation method #CVPR2024 🧵 [1/8]
1
12
32
@ftm_guney
F. Güney
1 year
we’ve got multiple PhD and postdoc positions funded by my #ERCstg project ENSURE. if you’re interested in computer vision and self-driving, please consider applying. graduate students: apply ASAP! details at https://t.co/LmhaOEeXuL postdocs: send me an email with your CV and
7
28
106
@soundboy
Ian Hogarth
1 year
1/ Today the UK's AI Safety Institute is open sourcing our safety evaluations platform. We call it "Inspect":
Tweet card summary image
gov.uk
The AI Safety Institute has open released a new testing platform to strengthen AI safety evaluations.
7
80
291