Alexandre Défossez @honualx profile

Alexandre Défossez

@honualx

Followers

3,888

Following

490

Media

57

Statuses

595

Founding researcher @kyutai_labs , with strong interests in stochastic optimization, audio generative models, and AI for science.

https://t.co/b0ofsvHWTO

Paris, France

Joined March 2019

Don't wanna be here? Send us removal request.

Explore tweets Explore followers Explore following

Explore trending content on Musk Viewer

ALL EYES ON RAFAH • 796778 Tweets

De Niro • 387631 Tweets

Palestino • 103996 Tweets

Fernando • 92116 Tweets

#WWENXT • 56714 Tweets

Thomsen • 54044 Tweets

#خادم_الحرمين_الشريفين • 50758 Tweets

Millonarios • 38365 Tweets

Lali • 37777 Tweets

Coronado • 36152 Tweets

Lala • 35490 Tweets

Renê • 34261 Tweets

DAME MIL FURIAS • 34159 Tweets

Paulinho • 33737 Tweets

Peñarol • 30256 Tweets

$BOOST • 26582 Tweets

Wesley • 18251 Tweets

Sudamericana • 16133 Tweets

Josh Gibson • 14829 Tweets

Jordynne Grace • 13047 Tweets

#PumpRules • 11494 Tweets

All Ego

Dwight Powell

Ayrton Lucas

Varela

Lovera

Luiz Araújo

Scott Foster

台風一過

Exum

Bustos

Coudet

Panarin

Lorran

直言さん

はまこじインライ

Sol Perez

Guercio

花江ちゃん

Belgrano

Booker T

#اسراييل_ارهابيه

Gamero

Arrascaeta

David Luiz

Ben Brown

#MasterChefBR

Ethan Page

Bruno Henrique

Last Seen Profiles

@makeawishsweet

@Dra_Shecares

@namidegermemet

@a_kanatas

@_souvlakii

@hallofmirrors

@Perfect8Perfect

@jntte

@scariocaam

@AliceNie4

@braelyn_cesto

@fishahhh

@AWAJID2003

@gashinho01

@ed_yemeni

@christianray87

@blablavvd

@oyamediagroup

@CJLifesABeach80

@JoshHo1den

Alexandre Défossez

@honualx

3 years

I'm happy to release the v3 of Demucs for Music Source Separation, with hybrid domain prediction, compressed residual branches and much more. Checkout the code: Here is a demo for you @jaimealtozano , I'm sure you'll enjoy the improvements!

14

112

604

Alexandre Défossez

@honualx

4 years

I recently discovered Perlin noise, a stochastic texture generation algorithm used to make realistic fire, smoke, clouds etc. It was developed by Ken Perlin for the CGI of Disney movie Tron in 1982 🤖 (1/N)

8

56

473

Alexandre Défossez

@honualx

5 years

We have released our platform for source separation in music. We adapt Conv-Tasnet and introduce the Demucs architecture, leading to two state-of-the-art models surpassing all previously known methods such as Wave-U-Net, Open-Unmix or Spleeter.

GitHub - facebookresearch/demucs: Code for the paper Hybrid Spectrogram and Waveform Source...

Code for the paper Hybrid Spectrogram and Waveform Source Separation - facebookresearch/demucs

github.com

13

143

472

Alexandre Défossez

@honualx

6 months

AI is nothing without open source, #keepaiopen 🤗

12

74

400

Alexandre Défossez

@honualx

7 months

We release stereo models for all MusicGen variants (+ a new large melody both mono and stereo): 6 new models available on HuggingFace (thanks @reach_vb ). We show how a simple fine tuning procedure with codebook interleaving takes us from boring mono to immersive stereo🎧👇

17

83

410

Alexandre Défossez

@honualx

4 years

I have extended Julius with some extra features: FFT convolutions, FIR filters and decomposition over frequency bands in the waveform domain. All in @PyTorch , differentiable and with CUDA and TorchScript support.

7

58

366

Alexandre Défossez

@honualx

2 years

With @jadecopet , @syhw and @adiyossLC , we are releasing EnCodec, a state-of-the-art neural audio codec supporting both 24 kHz mono audio and 48 kHz stereo, with bandwidth ranging from 1.5 kbps to 24 kbps 🗜️🎤🤖

8

76

246

Alexandre Défossez

@honualx

6 months

Really excited to be part of the founding team of @kyutai_labs : at the heart of our mission is doing open source and open science in AI🔬📖. Thanks so much to our founding donators for making this happen 🇪🇺 I’m thrilled to get to work with such a talented team and grow the lab 😊

kyutai

@kyutai_labs

6 months

Announcing Kyutai: a non-profit AI lab dedicated to open science. Thanks to Xavier Niel ( @GroupeIliad ), Rodolphe Saadé ( @cmacgm ) and Eric Schmidt ( @SchmidtFutures ), we are starting with almost 300M€ of philanthropic support. Meet the team ⬇️

18

164

757

13

6

212

Alexandre Défossez

@honualx

1 year

Today we release MusicGen, a text-to-music auto-regressive model built on EnCodec. It also supports optional melody conditioning based on chroma-gram extraction! It requires only 50 autoregressive steps per second of audio. Really fun to remix known tune in all genre 👇 + 🧵

Felix Kreuk

@FelixKreuk

1 year

We present MusicGen: A simple and controllable music generation model. MusicGen can be prompted by both text and melody. We release code (MIT) and models (CC-BY NC) for open research, reproducibility, and for the music community:

36

424

2K

5

44

194

Alexandre Défossez

@honualx

6 months

As a PhD student and RS, FAIR was a magical place to be in: - incredible mentoring in all fields of AI🧑‍🏫 - access to resources and having my own research agenda 🧭 - free and encouraged to publish and open source 📖 For a lot of us there it was a transformative experience 🧑🏻‍🚀

Josh XT — e/acc

@Josh_XT

6 months

@ylecun Meta has definitely been the best thing to happen to AI.

12

5

183

4

7

180

Alexandre Défossez

@honualx

4 years

We are releasing the code for our Interspeech paper "Real Time Enhancement in the Waveform Domain" with @syhw and @adiyossLC . Watch our live demo . Want to try it? Checkout our repo (1/2)

GitHub - facebookresearch/denoiser: Real Time Speech Enhancement in the Waveform Domain (Interspe...

Real Time Speech Enhancement in the Waveform Domain (Interspeech 2020)We provide a PyTorch implementation of the paper Real Time Speech Enhancement in the Waveform Domain. In which, we present a ca...

github.com

7

52

175

Alexandre Défossez

@honualx

1 year

Official MusicGen now also supports extended generation (different implem, same idea). Go to our colab to test it. And keep an eye on @camenduru for more cool stuff! Of course, I tested it with an Interstellar deep remix as lo-fi with organic samples :)

camenduru

@camenduru

1 year

Good news 🥳 Now we can generate more than 30s, Thanks to rkfg ❤ and Oncorporation ❤ Please try it 🐣 🦆 🖼 stable diffusion model Freedom Redmond by @artificialguybr

10

70

231

6

33

173

Alexandre Défossez

@honualx

4 years

I'm releasing `julius`, a package for fast and differentiable resampling of 1D signals in PyTorch. It uses the same algorithm as resampy but optimized for the case where the ratio of the old and new sample rate is a simple irreducible fraction.

GitHub - adefossez/julius: Fast PyTorch based DSP for audio and 1D signals

Fast PyTorch based DSP for audio and 1D signals. Contribute to adefossez/julius development by creating an account on GitHub.

github.com

1

26

144

Alexandre Défossez

@honualx

6 months

We do not have a demo booth at #NeurIPS2023 but the MusicGen demo is always online 💻 and all code is open source 📖, with @jadecopet and @FelixKreuk 🎶🥁

MusicGen - a Hugging Face Space by facebook

huggingface.co

2

29

123

Alexandre Défossez

@honualx

8 months

Our work on decoding is now published in Nature Machine Intelligence! We release the code to reproduce our results (and improve on them) based on public datasets (175 subjects, 160+ hours of brain recordings, EEG and MEG) 🧠🔬🖥️

GitHub - facebookresearch/brainmagick: Training and evaluation pipeline for MEG and EEG brain...

Training and evaluation pipeline for MEG and EEG brain signal encoding and decoding using deep learning. Code for our paper "Decoding speech perception from non-invasive brain recordings&a...

github.com

Jean-Rémi King

@JeanRemiKing

8 months

`Decoding speech perception from non-invasive brain recordings`, led by the one an only @honualx is just out in the latest issue of Nature Machine Intelligence: - open-access paper: - full training code:

5

115

458

7

34

133

Alexandre Défossez

@honualx

3 years

Working on the release of the hybrid Demucs model that won the @musdemix challenge. A preview of the improvement on the drums of Californication:

4

13

129

Alexandre Défossez

@honualx

1 year

MusicGen is definitely good at EDM (chroma conditioning from Interstellar used + some EDM description). Sadly the Interstellar theme doesn't really make it through the Chroma transform...

11

21

110

Alexandre Défossez

@honualx

3 years

@adiyossLC @syhw and I are happy to present our work: Differentiable Model Compression with Pseudo Quantization Noise 🗜️💾🤖 Our method, DiffQ, uses additive noise as a proxy for quantization, giving differentiability with no Straight Through Estimator👇

3

24

102

Alexandre Défossez

@honualx

4 years

Wondering how Adam and Adagrad work for non convex optimization? Go read our paper with L. Bottou, @BachFrancis and N. Usunier with a 2 pages proof covering both and one message: Adam is to Adagrad like constant step size SGD to decaying step size SGD .

1

22

101

Alexandre Défossez

@honualx

1 year

If you are interested in language modeling for audio 🔊/ music generation 🎶, remember that Encodec provides high quality discrete tokens that can be decoded to audio similar to Soundstream ! Both AudioGen and VALL-E built on it 😉

3

15

97

Alexandre Défossez

@honualx

6 months

Looking forward to discuss open research at @kyutai_labs . If you want to work on large scale multimodal LLMs, come and talk to us, this is what we look like 👇☕️

Neil Zeghidour

@neilzegh

6 months

Look for my @kyutai_labs colleagues at #NeurIPS2023 if you want to learn more about our mission. We are recruiting permanent staff, post-docs and interns!

0

3

39

3

8

93

Alexandre Défossez

@honualx

11 months

EnCodec, is now on 🤗 Transformers!Think of it as a low level latent space 🔮 inversible to audio 🔊 It also provides a discrete space for Language Models as used in our MusicGen model.

Vaibhav (VB) Srivastav

@reach_vb

11 months

Want to train your own Bark/MusicGen-like TTS/TTA models? 👀 The SoTA Encodec model by @MetaAI has now landed in 🤗Transformers! It supports compression up to 1.5KHz and produces discrete audio representations. ⚡️ Model: Colab:

10

103

448

2

9

93

Alexandre Défossez

@honualx

4 years

As I am writing my PhD thesis and glueing papers together, I needed to merge bibtex files, remove duplicates and rewrite the tex files to reflect these changes. In case this would be helpful to anyone, here is the code to automate this:

clean_bib.py

GitHub Gist: instantly share code, notes, and snippets.

gist.github.com

3

17

89

Alexandre Défossez

@honualx

10 months

Really excited about this release! We provide all the tools you need to start training your own audio models or just play with the ones we provide🤖🎸🔊And amazing work by @jadecopet for the final sprint 🤾‍♀️🚀

GitHub - facebookresearch/audiocraft: Audiocraft is a library for audio processing and generation...

Audiocraft is a library for audio processing and generation with deep learning. It features the state-of-the-art EnCodec audio compressor / tokenizer, along with MusicGen, a simple and controllable...

github.com

Jade Copet

@jadecopet

10 months

Today we open source the training code for our audio generation and compression research in AudioCraft and share new models. With this release, we aim at giving people the full recipe to play with our models and develop their own models!

4

26

146

5

9

84

Alexandre Défossez

@honualx

4 years

Demucs is now available under the MIT license. I hope this can lead to new applications for this research 🥁🪕🎹🎤🌊 PS: I also released 8bit quantized models, for faster download 🗜️💾

GitHub - facebookresearch/demucs: Code for the paper Hybrid Spectrogram and Waveform Source...

Code for the paper Hybrid Spectrogram and Waveform Source Separation - facebookresearch/demucs

github.com

2

24

81

Alexandre Défossez

@honualx

4 months

Watermarking is an increasingly important component of gen' AI, ensuring a safe and detectable usage. With @RobinSanroman , @pierrefdz , @hadyelsahar et al., we release AudioSeal, a faster, less audible, and more reliable watermarking for audio. 🧵🔽 1/6

Elucidating the Design Space of Diffusion-Based Generative Models

We argue that the theory and practice of diffusion-based generative models are currently unnecessarily convoluted and seek to remedy the situation by presenting a design space that clearly...

arxiv.org

Robin San Roman

@RobinSanroman

4 months

AudioSeal generates a watermark that hides in the signal. The detector is then able to flag watermarked parts of the audio with sample level precision. 2/n

1

2

10

3

11

81

Alexandre Défossez

@honualx

6 months

I’m in at Neurips, 3 posters with my former team: Simple and Controllable Music Generation, 10:45am Thu. #603 From Discrete Tokens to HiFI Audio using MultiBand Diffusion, 5pm Wed. #604 Textually Pretrained Speech Language Models #543 same time. Let’s talk about @kyutai_labs too!

2

3

79

Alexandre Défossez

@honualx

10 months

@sanchitgandhi99 is letting me know that we are about to cross the 1 million downloads for MusicGen! And AudioCraft has just crossed 10k⭐️ Really excited and touched to see such a fast adoption by the community ⚡️ @jadecopet @FelixKreuk @adiyossLC @syhw 🙌

2

10

72

Alexandre Défossez

@honualx

11 months

MusicGen is now supported in the 🤗Transformers library! Thanks a lot Sanchit for the integration. Next step: training / fine-tuning 🎯

Sanchit Gandhi

@sanchitgandhi99

11 months

MusicGen is state-of-the-art model for music generation by Meta AI 🎶 Now available on the main branch of 🤗 Transformers! Check out the Colab here to get started:

4

58

218

1

9

68

Alexandre Défossez

@honualx

7 months

Lucky to have @huggingface making everything easy and fast 😍 thanks @reach_vb and @sanchitgandhi99 for your support!

Vaibhav (VB) Srivastav

@reach_vb

8 months

Generate melodies with MusicGen & Transformers, but faster! ⚡️ import torch from transformers import pipeline pipe = pipeline("text-to-audio", "facebook/musicgen-small", torch_dtype=torch.float16) pipe("upbeat lo-fi music") That's it! 🤗

12

85

399

3

21

62

Alexandre Défossez

@honualx

5 years

Listen to the result on one of my favorite songs:

3

19

62

Alexandre Défossez

@honualx

8 months

I want to build a small dataset of music descriptions (~40) matching how people use generative models in real life ⌨️🔊 If you want to participate, please comment below with your description amd the prefix "My description released under CC NC BY 4.0: ", thanks for the reposts 🙏

38

8

63

Alexandre Défossez

@honualx

6 months

The multi task gradient balancing operator we introduced for training EnCodec is picking up steam 🚂⚖️ Think of it as having 1 Adam per loss term, except with no runtime or memory extra cost. No more lambda_1=0.001 and lambda_2=250 🤨🧘

Guillaume Bellec

@BellecGuill

6 months

I released today my mini-torch toolkit for multitask learning. The most useful code-bit is minimal re-implementation of @honualx solution to auto-scale losses with very different scaling. Happy to chat if someone's interested.

1

48

0

8

60

Alexandre Défossez

@honualx

2 years

Glad to present our work with @JeanRemiKing , @c_caucheteux on non-invasive brain decoding🧠 1⃣ A contrastive loss is great for hard problems like decoding. 2⃣ It allows to leverage pre-trained Wav2Vec representations. 3⃣ We reach 44% top-1 acc. on MEG data on unseen sentences.

Jean-Rémi King

@JeanRemiKing

2 years

“Decoding speech from non-invasive brain recordings”, Our latest study (on 169 participants!), by @honualx and our wonderful team @MetaAI - paper: - blog: - illustrated summary: below👇

16

150

691

2

9

62

Alexandre Défossez

@honualx

1 year

Demucs v4 is now on PyPI with HTDemucs now the default model. Use `-n mdx_extra_q` if you need he old one back, as for some tracks it might still be best. Also added an experimental 6 sources model `htdemucs_6s` with piano and guitar, although I observe some bleeding + artifacts.

3

11

60

Alexandre Défossez

@honualx

2 years

Using a post-processing network with GANs help remove source separation artefacts. Maybe a sign that we should put less emphasis on the reconstruction SDR!

AK

@_akhaliq

2 years

Music Separation Enhancement with Generative Modeling abs: project page:

0

30

189

1

6

57

Alexandre Défossez

@honualx

1 year

The dignity of audio scientists finally restored after a short time with a vision based SOTA in music gen 🥲 Great work released by Google Brain with @neilzegh @antoine_caillon @jesseengel among others.

Keunwoo Choi

@keunwoochoi

1 year

really well done, from SoundStream and AudioLM through MuLan to MusicLM 👏👏 the overall structure of MusicLM = MuLan + AudioLM = MuLan + w2v-BERT + SoundStream

2

21

253

2

9

55

Alexandre Défossez

@honualx

3 years

Another day, another project. Seewav is a python tool to generate videos from audio files. It's not perfect yet, but it is already quite useful for slideshows and demos. pip install -U seewav seewav myaudio_file.mp3 # -> write to out.mp4

0

7

55

Alexandre Défossez

@honualx

4 years

If you want to learn more about Demucs and source separation for music, this is for you 🌊🥁🪕🎹🎤

2

19

55

Alexandre Défossez

@honualx

3 years

Demucs ranked 1st when trained on Musdb only (track A) and 2nd when trained with custom data (track B). Its SDR is nearly 1.5 dB better than before the competition. More info, code and paper coming up! A huge thank to @musdemix for the organization 🙏

Sound Demixing Challenge

@sounddemix

3 years

The final leaderboards are live! Take a look at

0

11

31

5

14

54

Alexandre Défossez

@honualx

3 years

Is Demucs for music source separation out-fashioned already? With new data aug., on-the-fly resampling, and quantization, Demucs is 150MB, reaches 6.3 SDR (5.6 before) when trained on MusDB, and surpasses the IRM for bass by 0.5 dB with 150 extra songs.👇

1

16

52

Alexandre Défossez

@honualx

1 year

Huge thanks to @_akhaliq , @julien_c and the team at @huggingface for providing extensive support for the demo 🤗

AK

@_akhaliq

1 year

Meta just released MusicGen, a simple and controllable model for music generation MusicGen is a single stage auto-regressive Transformer model trained over a 32kHz EnCodec tokenizer with 4 codebooks sampled at 50 Hz. Unlike existing methods like MusicLM, MusicGen doesn't not

46

429

2K

1

3

51

Alexandre Défossez

@honualx

1 year

My favorite MusicGen remix from one of Bach's fugue, combined with the prompt "a light and cheerly EDM track, with syncopated drums, aery pads, and strong emotions". Make your own on HF or Google Colab (links in the repo ).

0

9

47

Alexandre Défossez

@honualx

3 years

Happy to share our work on MEG brain activity prediction. This opens up exciting opportunities for deep learning based modeling of brain signals 🧠 🤖 📈

Jean-Rémi King

@JeanRemiKing

3 years

Deep learning improves the analysis of time-resolved brain signals by ... 3️⃣ folds! Check out our latest paper by @lomarchehab *, @honualx *, @loiseau_jc , and @agramfort : Below is the summary thread 👇

7

70

216

1

10

45

Alexandre Défossez

@honualx

4 years

Demucs can now enhance your speech in real time, in the waveform domain 🌊🗣️🎉➰📞👵. We study many tricks and their impact on perceptual quality and intelligibility: spectrogram losses, reverb aug., dry/wet balance. With @syhw and @adiyossLC .

Real Time Speech Enhancement in the Waveform Domain

We present a causal speech enhancement model working on the raw waveform that runs in real-time on a laptop CPU. The proposed model is based on an encoder-decoder architecture with...

arxiv.org

5

15

45

Alexandre Défossez

@honualx

1 year

I will be presenting our work on Transformers for Music Source Separation tomorrow, 6th of May, at ICASSP, during MLSP-P5 poster session on Source Separation, ICA and Sparsity, in the Garden P6 area from 2pm to 3:30pm. Hope to see you there!

Simon Rouard

@simonrouard

2 years

Glad to present our work with @fvsmassa and @honualx « Hybrid Transformers for Music Source Separation » done at @MetaAI . We achieve 9.20 dB of SDR on the MUSDB18 test set. - paper: - code: - audio: 1/5

4

35

151

3

1

44

Alexandre Défossez

@honualx

2 years

Demucs has reached 3k stars on Github 😊 Hopefully by the end of the year we should have new models released with more instruments 🎸 🎹

0

1

46

Alexandre Défossez

@honualx

6 months

Thanks so much for making this journey happen @RodolpheSaade , @ericschmidt , @Xavier75 . Excited to ship AI in the whole French and European ecosystem with you 🇪🇺🤖⚓️

Rodolphe Saade

@RodolpheSaade

6 months

Thrilled to announce the launch of Kyutai with @Xavier75 & @ericschmidt , the 1st European AI research lab. Researchers & entrepreneurs will have all the resources they need to shape our future. Excited to delve into use cases for transport & logistics. Join us in the AI journey !

11

45

220

0

43

Alexandre Défossez

@honualx

4 years

Very nice discussion on the artifacts of transposed convolutions for audio processing. I think that training mostly erase the initial artifacts, and the speed gain makes it a no brainer. Pons et al. suggest minor changes to remove those from the start!

Jordi Pons

@jordiponsdotme

4 years

Our last paper is out! "Upsampling artifacts in neural audio synthesis" arXiv: Work with @santty128 , Giulio Cengarle and @serrjoa .

3

34

193

2

9

42

Alexandre Défossez

@honualx

2 years

I know its not the most fun in ML at the moment, but I finally made a Colab for easily playing with our Demucs based speech denoising model:

Denoiser Example.ipynb

Colaboratory notebook

colab.research.google.com

3

6

39

Alexandre Défossez

@honualx

1 year

We are looking to release the training code and more during the summer @danlyth . This will go far beyond just compression ⚙️🔊

Dan Lyth

@danlyth

1 year

Encodec has just changed to an MIT license. Great news for anyone working on LM approaches to audio or just looking for a high-quality audio codec. No training code but still a really significant change. Thanks @honualx .

0

6

29

2

39

Alexandre Défossez

@honualx

1 year

Bark is an awesome text to audio model by Suno AI. It combines two great papers on audio modeling: AudioLM by Google and Valle by Microsoft, and uses our open source EnCodec model as audio tokens. A sign that open research and open source are essentiel for our field.

AK

@_akhaliq

1 year

🐶 Bark Bark is a transformer-based text-to-audio model created by Suno. Bark can generate highly realistic, multilingual speech as well as other audio - including music, background noise and simple sound effects. The model can also produce nonverbal communications like

29

200

976

0

6

38

Alexandre Défossez

@honualx

1 year

I've just released two Hybrid Demucs baselines for the SDX23 music demixing challenge, featuring learning with corrupted labels or bleeding. To get started, head over to cc @sounddemix @moises_ai @aicrowdHQ

GitHub - adefossez/sdx23: SDX23 startkit for the Demucs baselines.

SDX23 startkit for the Demucs baselines. Contribute to adefossez/sdx23 development by creating an account on GitHub.

github.com

0

6

36

Alexandre Défossez

@honualx

4 years

@deepwhitman @techatfacebook Spleeter is a spectrogram based method, while Demucs works directly on the waveform. While waveform methods used to lag behind in terms of quality, Demucs was one of the first to outperform spectrogram methods.

2

1

35

Alexandre Défossez

@honualx

6 months

Really impressive performance from the latest Mistral model. Congrats to the whole team, and let’s that be some inspiration for the open source research in Paris.

Guillaume Lample @ ICLR 2024

@GuillaumeLample

6 months

More details about Mixtral can be found at We are also very happy to announce "La plateforme" our early developer platform (in beta & limited access), to access our models through our API: (7/n)

5

12

222

0

1

31

Alexandre Défossez

@honualx

4 years

Muchas gracias @jaimealtozano para presentar #demucs 😀

Probamos una Inteligencia Artificial que separa una cancion POR...

🎹¿Quieres aprender a tocar el piano conmigo? https://www.musihacks.com/aprende-a-tocar-el-piano?utm_source=youtube&utm_medium=linkdesc&utm_campaign=organico...

www.youtube.com

4

3

30

Alexandre Défossez

@honualx

5 months

What’s better than the hottest LLM? The hottest LLM running in Rust 😍

Laurent Mazare

@lmazare

6 months

The new Mixtral 8x7b MoE model from @MistralAI is now available in candle - including support for quantized models. These can be run locally on a laptop with 32GB of memory, all this powered by #rust and #opensource !

7

62

434

0

2

30

Alexandre Défossez

@honualx

1 year

Thanks @gordic_aleksa for the in depth coverage of Encodec, with paper and code! A good watch if your have some knowledge in deep learning and want to learn more on what is going on under the hood.

Aleksa Gordić 🍿🤖

@gordic_aleksa

2 years

I cover @MetaAI 's "High Fidelity Neural Audio Compression" paper and code. With only 6 kbps bandwidth they already get the same audio quality (as measured by the subjective MUSHRA metric) as mp3 at 64 kbps! YT: @honualx @jadecopet @syhw @adiyossLC 1/

2

18

116

1

3

29

Alexandre Défossez

@honualx

6 months

MusicGen and Demucs 😊

Gabriel Synnaeve

@syhw

6 months

Built with Musicgen!

0

3

22

1

2

29

Alexandre Défossez

@honualx

3 years

Demucs baseline for the @musdemix challenge is ready, starting at 6.20 overall SDR. Checkout the Demucs challenge page to learn how to get started 🌊🥁🎸🎹🎤

1

7

30

Alexandre Défossez

@honualx

10 months

Really excited to see a standard benchmark for going beyond 4 stems source separation! Thanks a lot @moises_ai for the release.

arXiv Sound

@ArxivSound

10 months

``Moisesdb: A dataset for source separation beyond 4-stems. (arXiv:2307.15913v1 []),'' Igor Pereira, Felipe Araújo, Filip Korzeniowski, Richard Vogl,

1

23

46

1

28

Alexandre Défossez

@honualx

11 months

A new generative model for music generation 🎶based on the MaskedGIT approach: fast generation and ability to do in painting 🖼️ @hugggof shows how you can generate infinite variations by keeping only one step out of P from the prompt.

hugo flores garcía 🌻

@hugggof

11 months

You can now try VampNet on @huggingface spaces! Thanks @_akhaliq for helping getting it on spaces, and @pseetharaman @ritheshkumar_ and Bryan Pardo for making this dope project happen! try it: show me your coolest/weirdest VampNet loops! :)

3

6

24

0

4

28

Alexandre Défossez

@honualx

4 years

Perlin noise can be extended to 3D or 4D. For higher dimensions, Perlin recommends using Simplex noise, but I didn't go that far. Source: PyTorch notebook:

1

0

25

Alexandre Défossez

@honualx

7 months

First, if you just want to play with it 💻🎶 Demo: Repo: Samples: Demo in Colab: Updated paper (@ Neurips 23 🥳): Now keep on reading for the details 🤓

Simple and Controllable Music Generation

We tackle the task of conditional music generation. We introduce MusicGen, a single Language Model (LM) that operates over several streams of compressed discrete music representation, i.e.,...

arxiv.org

1

4

27

Alexandre Défossez

@honualx

6 months

This a central part of our mission @kyutai_labs along with developing the next generation of deep learning modeling techniques. All the inspiring discussions we had at the @huggingface open source party yesterday show that this dynamics is only getting started 💫

1

0

26

Alexandre Défossez

@honualx

6 months

Come see @RobinSanroman at poster #604 to learn more on hifi diffusion vocoders.

0

1

26

Alexandre Défossez

@honualx

4 years

Try it now with `pip install -U julius`, and find more information on the repository:

GitHub - adefossez/julius: Fast PyTorch based DSP for audio and 1D signals

Fast PyTorch based DSP for audio and 1D signals. Contribute to adefossez/julius development by creating an account on GitHub.

github.com

2

1

26

Alexandre Défossez

@honualx

6 months

@RobinSanroman will present our work on diffusion vocoders at 5pm, poster #604 : "From discrete tokens to High-Fidelity audio using multi-band diffusion". 😶‍🌫️💻🔊 We use a diffusion U-Net conditioned on EnCodec tokens to replace the original adversarial decoder. 🧵 1/6

3

6

25

Alexandre Défossez

@honualx

1 year

Then, we introduce some optional chromagram based conditioning. This can be computed for any track, and gives a rough idea of the tune of a track, thus allowing easy and controllable "remixing" of any song you like. Play with it on our HuggingFace demo 🤗:

0

1

23

Alexandre Défossez

@honualx

2 years

I know some of you noticed the “ht” branch on the Demucs repo a few weeks ago. @simonrouard great work is finally out, showing that multi domain Transformers perform great on music source separation when using an extended dataset.

Simon Rouard

@simonrouard

2 years

Glad to present our work with @fvsmassa and @honualx « Hybrid Transformers for Music Source Separation » done at @MetaAI . We achieve 9.20 dB of SDR on the MUSDB18 test set. - paper: - code: - audio: 1/5

4

35

151

0

3

22

Alexandre Défossez

@honualx

3 years

I'll be presenting this work at the @ismir2021 MDX workshop on Friday at 12:50 UTC. Other winning teams from the contest will be presenting as well, including @WOOSUNGCHOI3 and @LiuHaohe . A huge thank to the workshop organizers @faroit , @AntoineLiutkus .

1

4

21

Alexandre Défossez

@honualx

9 months

Great upsampling model for all audio modalities by @LiuHaohe based on the AudioLDM design 👏↗️🔊 First extends the mel-spectrogram with latent diffusion then generates a waveform with HIFIGan. Maybe diffusion for the last stage could be nice too (but slow) 🤔

Haohe Liu

@LiuHaohe

9 months

🔊Introducing AudioSR: a plug-and-play & one-for-all solution to upsample your audio to stunning 48kHz quality! 👉Significant improvement verified on MusicGen (32kHz), AudioLDM (16kHz), and FastSpeech2 (22kHz)! Demo, code, and paper: #AudioSR

7

43

206

0

21

Alexandre Défossez

@honualx

6 months

Love this ! Using MusicGen for quick prototyping and enriching actual production is a great use. Most of the skill and magic still comes from the producers 💫

Gabriel Synnaeve

@syhw

6 months

by @NimOne510 @BigDaddyCh0p We're just at the beginning, we had lil' data But A.I. ain't replacing artists anytime A.I. ain't those guys creativity I tell ya They even turn A.I. brain farts into arts h/t @Thom_Wolf for the link

2

5

28

0

19

Alexandre Défossez

@honualx

2 years

Impressive jump in SDR on the MusDB HQ dataset (+0.6dB, and an extra 0.7dB with unsupervised data) using multiple DualPathRNN on frequency bands.

arXiv Sound

@ArxivSound

2 years

``Music Source Separation with Band-split RNN. (arXiv:2209.15174v1 []),'' Yi Luo, Jianwei Yu,

0

1

18

0

18

Alexandre Défossez

@honualx

4 years

The key idea is to sample a random vector field over a coarse grid, and compute its integral over a finer grid of pixels. Perlin provided an efficient algorithm, where the value at each pixel is computed from a specific interpolation of dot products with the 4 nearby vectors.

1

17

Alexandre Défossez

@honualx

4 years

𝜋 in 1 tweet (up to 1e-12 error): q,p,g,pi = 1,0,1e-5,0 while 1: nq = q + g*p np = p - g*nq if (pi > 0)*(np*p <= 0): g /= 10 if g: continue else: break q,p = nq,np pi += g import math print(abs(pi - math.pi))

1

2

17

Alexandre Défossez

@honualx

4 years

In order to achieve a smoke like structure, one can sum Perlin noises with different grid sizes, adding finer and finer details, and giving the texture a fractal structure.

2

0

16

Alexandre Défossez

@honualx

3 years

The released models finished 1st (MusDB only) and 2nd (extra data) at the @musdemix 2021 challenge. SDR is up 1.4 dB on MusDB and subjective evaluation show less artifacts and less contaminations. Checkout the paper for more info: .

3

1

16

Alexandre Défossez

@honualx

6 months

Neat approach to training free multimodal labeling from pretrained unimodal models and small dset of paired text images. Given a new image I, compute its cos sim. with all I_k in the dset -> C_I. do the same for a new text T with all T_k -> C_T. Now you can dot prod C_I and C_T.

Antonio Norelli

@noranta4

6 months

Next we compute new representations made of cosine similarities. We call them relative representations (rel-reps)

1

0

10

0

1

16

Alexandre Défossez

@honualx

3 years

@pozzoli_lucas @musdemix I will release the code and models for the @musdemix workshop at ISMIR on the 12th of November. The workshop will be a cool place to learn about the technology if you are curious.

2

1

16

Alexandre Défossez

@honualx

3 years

Who needs 5G when you can compress audio with Soundstream? 😍 next one is Videostream?

Neil Zeghidour

@neilzegh

3 years

Check Soundstream, our neural audio codec: * outperforms Opus & EVS on speech & music w/ up to 4x fewer bits * scalable: 1 model for all bitrates * runs real-time on 1 smartphone CPU * controllable denoising Paper: Samples 🔊 : 1/5

7

13

89

0

1

15

Alexandre Défossez

@honualx

3 years

Finally we provide an easy to use and generic API to apply our method to any architecture. Check the repo and paper for more: (5/5)

1

14

Alexandre Défossez

@honualx

2 years

Waveform based self supervised representation like Wav2vec are rich, explaining both low level processing in the auditory cortex, and semantic processing from the prefrontal cortex. Congrats @c_caucheteux et al.

Charlotte Caucheteux

@c_caucheteux

2 years

Result 2: The hierarchy learnt by the algorithm maps onto the brain's: The auditory cortex is best aligned with the first layer of the transformer (blue), whereas the prefrontal cortex is best aligned with its deepest layers (red). 3/n

1

3

30

1

0

15

Alexandre Défossez

@honualx

4 years

100h of all kind of sounds, human labeled and with clear licensing. That's exciting! 🤖

Eduardo Fonseca

@edfonseca_

4 years

🔊Happy to announce FSD50K: the new open dataset of human-labeled sound events! Over 51k Freesound audio clips, totalling over 100h of audio manually labeled using 200 classes drawn from the AudioSet Ontology. Paper: Dataset:

5

85

263

1

5

14

Alexandre Défossez

@honualx

2 years

Polyphonic midi transcription agnostic to the instrument. And light on top of that 🙂

Spotify Engineering

@SpotifyEng

2 years

Meet — a 🎹🎸🎻🪕🎺🎷🪗 🎤-to-MIDI converter that uses about 70 billion fewer parameters than those fancy neural networks. #MachineLearning

1

40

129

0

15

Alexandre Défossez

@honualx

1 year

Initializing an audio domain speech LM from a text LM gives a large perplexity boost! Latest work by @MichaelHassid et al. Paper:

Michael Hassid

@MichaelHassid

1 year

We also show that TWIST converges much faster. When using a 350M parameter LM, TWIST archives similar perplexity with only 1/4 of the training steps. 4/n

1

0

3

1

0

13

Alexandre Défossez

@honualx

6 months

And on the other side, @MichaelHassid will tell you all about using warm start for your Speech LM at #543 !

0

13

Alexandre Défossez

@honualx

2 years

EnCodec is the first neural audio codec for hi-fi music, with strong ratings from human evaluators.

1

6

13

Alexandre Défossez

@honualx

3 years

Now that’s some real non supervision 😻

Alexis Conneau

@alex_conneau

3 years

New work: "Unsupervised speech recognition" TL;DR: it's possible for a neural network to transcribe speech into text with very strong performance, without being given any labeled data. Paper: Blog: Code:

3

96

467

0

2

14

Alexandre Défossez

@honualx

2 years

Text guided generation for general audio 📜🤖🎧

Felix Kreuk

@FelixKreuk

2 years

We present “AudioGen: Textually Guided Audio Generation”! AudioGen is an autoregressive transformer LM that synthesizes general audio conditioned on text (Text-to-Audio). 📖 Paper: 🎵 Samples: 💻 Code & models - soon! (1/n)

96

966

5K

0

1

13

Alexandre Défossez

@honualx

4 years

This repo is underrated, cross platform Lame MP3 encoder bindings in Python with binary distrib, as simple as `pip install lameenc`:

GitHub - chrisstaite/lameenc: Python bindings around the LAME encoder

Python bindings around the LAME encoder. Contribute to chrisstaite/lameenc development by creating an account on GitHub.

github.com

0

2

12

Alexandre Défossez

@honualx

3 years

Impressive results for end-to-end classification from the raw waveform.

Neil Zeghidour

@neilzegh

3 years

I will present LEAF, a learnable frontend for audio classification, at ICLR 2021. * Learns filtering, pooling, compression, normalization * Evaluated on 8 tasks, incl. speech, music, birds * Outperforms mel-fbanks, SincNet, and others * SOTA on AudioSet

1

37

176

0

13

Alexandre Défossez

@honualx

1 year

DinoV2 can do semantic key point matching for a wide range of objects, aka pose estimation, without being trained for that 😱 @TimDarcet how reliable is that for downstream applications ?

TimDarcet

@TimDarcet

1 year

6/ With these capabilities emerge new interesting properties. A very nice one is the ability to perform semantic keypoint matching between images simply by matching the closest features. This works across very different domains !

2

12

56

1

0

12

Alexandre Défossez

@honualx

7 months

As we keep the same EnCodec, we remain compatible with our existing MultiBandDiffusion model. In fact all samples in the first video were generated using MBD as a decoder. See which @RobinSanroman will present at Neurips 2023 too. One last sample:

1

12

Alexandre Défossez

@honualx

2 years

LLMs are nice but how can we trust them? For maths, @AlbertQJiang , @GuillaumeLample at al. show that they can work along formal provers to get the best of both world.

Albert Jiang

@AlbertQJiang

2 years

Large language models can write informal proofs, translate them into formal ones, and achieve SoTA performance in proving competition-level maths problems! LM-generated informal proofs are sometimes more useful than the human ground truth 🤯 Preprint: 🧵