Patrick Pérez @ptrkprz X Profile

Patrick Pérez

@ptrkprz

Followers

702

Following

49

Media

0

Statuses

38

AI & CV scientist, CEO at @kyutai_labs

https://t.co/9v8BormI1w

Paris

Joined December 2023

Don't wanna be here? Send us removal request.

Patrick Pérez

@ptrkprz

1 year

As promised, we are sharing the technology behind Moshi: paper+models+inference code for everyone.

kyutai

@kyutai_labs

1 year

Today, we release several Moshi artifacts: a long technical report with all the details behind our model, weights for Moshi and its Mimi codec, along with streaming inference code in Pytorch, Rust and MLX. More details below 🧵 ⬇️ Paper: https://t.co/JQtEMppifK Repo:

3

10

112

Patrick Pérez

@ptrkprz

9 months

changing air, entering blue sky, same handle

0

3

Patrick Pérez

@ptrkprz

9 months

New sharing step on our journey towards easy-to-use fully-open models.

kyutai

@kyutai_labs

9 months

Meet Helium-1 preview, our 2B multi-lingual LLM, targeting edge and mobile devices, released under a CC-BY license. Start building with it today! https://t.co/X4Dbx2T1cJ

1

2

14

Alexandre Défossez

@honualx

1 year

I’ll be presenting a deep dive into how Moshi works at the next NLP Meetup in Paris, this Wednesday the 9th at 7pm. Register if you want to attend ! 🧩🔎🟢 https://t.co/1ZPb105JKX

meetup.com

📍8 rue Cambacérès, 75008 Paris 📆 October 9th, 7:00 p.m. ⚠️ **Limited spots available.** Be sure to reserve your place in advance! **👥 Alexandre Défossez - Chief Explora

5

10

72

Patrick Pérez

@ptrkprz

1 year

Serious stress testing!

Neil Zeghidour

@neilzegh

1 year

Voice AIs handle speaker turns & interruptions with Voice Activity Detection. VAD is brittle and will trigger due to background noise, creating frequent hiccups. Moshi gets rid of it completely, so you can use it in the most chaotic settings. I myself couldn't hear Moshi here 😅

0

2

Andrej Karpathy

@karpathy

1 year

Moshi is a very nice/fun conversational AI audio 🔊 model release from @kyutai_labs . Are you slowly losing faith in the objective reality and existence of Advanced Voice Mode? Talk to Moshi instead :) You can talk to it on their website: https://t.co/OQpIaXx8wL Or even locally

kyutai

@kyutai_labs

1 year

Today, we release several Moshi artifacts: a long technical report with all the details behind our model, weights for Moshi and its Mimi codec, along with streaming inference code in Pytorch, Rust and MLX. More details below 🧵 ⬇️ Paper: https://t.co/JQtEMppifK Repo:

71

321

3K

MΔXIMUS HΞRTŻFΞLT

@hertzfelt_io

1 year

Watch @kyutai_labs #moshiai and @OpenAI #gpt4o discuss #AGI. #AI #gpt4ovoice

0

4

9

Patrick Pérez

@ptrkprz

1 year

can even be explored on a vacation beach or a conference center, as Moshi is robust to noisy environments

kyutai

@kyutai_labs

1 year

"Hippie" Moshi tells its love for Hendrix...but "skeptical" Moshi is less enthusiastic about psychedelic rock. Moshi can play 70+ emotions, will you catch them all? Try now at https://t.co/lU2sqa8wMQ

0

4

Patrick Pérez

@ptrkprz

1 year

Meet our ambassador!

kyutai

@kyutai_labs

1 year

If you're attending ICML and want to learn more about Kyutai and Moshi, reach out to Edouard!

0

1

Patrick Pérez

@ptrkprz

1 year

Staying in real-time connection with voice AI in Paris while being in Vienna

Edouard Grave

@EXGRV

1 year

Moshi goes to #ICML2024 in Vienna! Try the demo at https://t.co/weFG6cmhDT

0

2

Patrick Pérez

@ptrkprz

1 year

The attentive listener will notice that even when speaking over Alex, Moshi still listens (taking into account the "in space" instruction for the second poem)

Alexandre Défossez

@honualx

1 year

Some Moshi extracts! Get your own at https://t.co/SVQZQ9UlEN Don't forget to click the "Download video" at the end (if it's good) 🟢

2

1

11

Patrick Pérez

@ptrkprz

1 year

And our demo runs in the US thanks to a donation from @huggingface

Patrick Pérez

@ptrkprz

1 year

Thanks @Thom_Wolf Moshi experimental voice AI is indeed a crazy adventure / a radical innovation / a new technology / a surprising experience / a research prototype / a shared resource / a starting point…. not a productized conversational bot.

0

5

Patrick Pérez

@ptrkprz

1 year

Thanks @Thom_Wolf Moshi experimental voice AI is indeed a crazy adventure / a radical innovation / a new technology / a surprising experience / a research prototype / a shared resource / a starting point…. not a productized conversational bot.

Thomas Wolf

@Thom_Wolf

1 year

The @kyutai_labs fully end-to-end audio model demo of today is a huge deal that many people missed in the room Mostly irrelevant are the facts that: - they come a few week after OpenAI ChatGPT-4o - the demo was less polished than the 4o one (in terms of voice quality, voice

0

1

9

Patrick Pérez

@ptrkprz

1 year

Research internships at @kyutai_labs are fun, beside the hard work! A good session by @RamaAdrien

kyutai

@kyutai_labs

1 year

Moshi is not an assistant, but rather a prototype for advancing real-time interaction with machines. It can chit-chat, discuss facts and make recommendations, but a more groundbreaking ability is its expressivity and spontaneity that allow for engaging into fun roleplay.

0

2

13

Patrick Pérez

@ptrkprz

1 year

It feels so good to have shared at last what we have been up to in the past 6 months. We worked hard on this unique voice AI, carefully training it on a mix of text and speech, making it multi-stream and real-time, and putting it in an online demo for everyone to experience it.

kyutai

@kyutai_labs

1 year

Yesterday we introduced Moshi, the lowest latency conversational AI ever released. Moshi can perform small talk, explain various concepts, engage in roleplay in many emotions and speaking styles. Talk to Moshi here https://t.co/a4EbAQiih7 and learn more about the method below 🧵.

4

5

55

Patrick Pérez

@ptrkprz

1 year

Please @abursuc keep one for me!

Andrei Bursuc

@abursuc

1 year

We've just launched our BRAVO robustness and reliability challenge for semantic segmentation. I and @tuan_hung_vu will be giving away these nice stickers @CVPR Ping us or catch us at the posters to find out more! #CVPR2024

0

5

Amir Zamir

@zamir_ar

1 year

We are releasing 4M-21 with a permissive license, including its source code and trained models. It's a pretty effective multimodal model that solves 10s of tasks & modalities. See the demo code, sample results, and the tokenizers of diverse modalities on the website. IMO, the

Amir Zamir

@zamir_ar

2 years

We are releasing the 1st version of 4M, a framework for training multimodal foundation models across tens of modalities & tasks, based on scalable masked modeling. Joint effort by @EPFL_en & @Apple. 4M: Massively Multimodal Masked Modeling 🌐 https://t.co/usE17pnXf9 🧵1/n

6

95

345

valeo.ai

@valeoai

1 year

📢We introduce the ScaLR models (code+checkpoints) for LiDAR perception distilled from vision foundation models tl;dr: don’t neglect the choice of teacher, student, and pretraining datasets -> their impact is probably more important than the distillation method #CVPR2024 🧵 [1/8]

1

12

32

F. Güney

@ftm_guney

1 year

we’ve got multiple PhD and postdoc positions funded by my #ERCstg project ENSURE. if you’re interested in computer vision and self-driving, please consider applying. graduate students: apply ASAP! details at https://t.co/LmhaOEeXuL postdocs: send me an email with your CV and

7

28

106

Ian Hogarth

@soundboy

1 year

1/ Today the UK's AI Safety Institute is open sourcing our safety evaluations platform. We call it "Inspect":

gov.uk

The AI Safety Institute has open released a new testing platform to strengthen AI safety evaluations.

7

80

291