One of the key models in MusicLM is SoundStream, an audio codec. It made vocoders obsolete; and reshaped audio generation as a token prediction task.
SS is not open to public, but a similar neural audio codec Encodec is completely open-source →
really well done, from SoundStream and AudioLM through MuLan to MusicLM 👏👏
the overall structure of MusicLM
= MuLan + AudioLM
= MuLan + w2v-BERT + SoundStream
MuLan is a text-music joint embedding model.
- contrastive training
- 44M music audio - text description pairs from "internet music videos" *cough cough* youtube *cough cough*
- AST: audio spectrogram transformer
Last Friday was my last day of the two years at Spotify.
I started to work at ByteDance AI Research from today.
(At Mountain View (California) in principle, but joined remotely from NYC)
I left ByteDance last Friday. It was such a 1.8 year ❤️ (base-12)
I'm glad I got what I wanted - a novel and intense learning experience. I shipped quite a few stuff, worked on research back-end tools, and made some research impact.
Now, time to move on :)
🌱 We’re hiring 2024 summer research interns on LLMs for drug discovery and biomedical applications. Join me,
@stephenrra
,
@kchonyc
, and other amazing people at NYC to work on the LLM product development of
@PrescientDesign
,
@genentech
✨
Details:
🥳 PROPOSAL: Foley Sound Synthesis Challenge 🥳
There are enough challenges out there for speech and music. We propose one for "the other" kind of audio -> sound. Or effects. Or, Foley.
We need to define the problem, dataset, and eval scheme. How? 🧵🧶
I summarized the difference between `tokenizers.Tokenizer`, `transformers.PreTrainedTokenizer`, and `transformers.PreTrainedTokenizerFast`. I even made a github repo just to post this.
Ahem, ahem.
:
I joined Gaudio Lab to - i'd dare say - pioneer some audio/music AI! 🥳
I'm excited more than ever :D
Oh, and I'll visit Seoul more often. Friends in 🇰🇷, catch up soon!
THIS IS BIG! All the music folks in Google Deepmind focus on one thing: AI music generation while NOT exploiting artists. Nothing is perfect, there're probably still some holes in giving the credit, but this is better than anything ever for very sure.
Thrilled to share
#Lyria
, the world's most sophisticated AI music generation system. From just a text prompt Lyria produces compelling music & vocals. Also: building new Music AI tools for artists to amplify creativity in partnership w/YT & music industry
New AI music model alert! yes, again 🎉
#SingSong
, another music generation model by Google;
@chrisdonahuey
et al.
Ok let me do another run for collecting followers. How does it work?
the “llama moment” has come to audio research today! i can’t even imagine what we’ll see out of AudioCraft.
whatever you work on in music/audio, do consider using it, as much as you can. if you don’t know what to do, think what you can do with it and get a head start.
Today we're sharing details on AudioCraft, a new family of generative AI models built for generating high-quality, realistic audio & music from text. AudioCraft is a single code base that works for music, sound, compression & generation — all in the same place.
More details ⬇️
If you belong to an underrepresented group in any sense (gender, race, nationality, financial situation, etc) and need some help on any MIR issues, please just contact me. gnuchoi at the-email-starting-with-G-you-know-what-I-mean😉
for
#icassp2024
attendees, i'm open sourcing my `What to eat around COEX` list. originally written for
@cwu307
but sharing it for a large crowd and make the world better place, reduce p(doom), etc.
Hi people!
Me and
@kchonyc
's
#ismir2019
paper, "Deep Unsupervised Drum Transcription" aka 🥁 DrummerNet is here.
Paper -->
Blog post -->
Supplementary material -->
to recap, i find the whole roadmap really, really brilliant.
- because there's MuLan, they could use audio-only dataset.
- because there's SoundStream, the music generation task was simplified to token generation, not waveform generation.
i'm teaching a class about AI at NYU, Spring 2024. it's "Deep Learning for Media", a course about AI for audio and visual contents.
oof, i thought i became an LLM person.
(it's not a job change, i'm covering one class this semester)
happy to find back a nyu dot edu account!
Ok now (restrospectively, on high-level) it's kinda simple.
given an training item:
- extract MuLan tokens (M), extract w2v-BERT (S), SS tokens (A)
- train model for M → S.
- train model for [M;S] → A
both done by decoder-only transformers.
👋
I joined
@PrescientDesign
recently. I distracted
@kchonyc
with music research circa 2016-2019. This time he offered me to join his realm -- languages! I'm already having a lot of fun, knowing more to come.
<shameless as always>
my papers are 1st and 6th most cited ISMIR paper in the last 5 years!🔥🔥
heard it was mentioned at the
#ismir2021
trivia organized by the titans
@r4b1tt
@urinieto
. i think they should arXiv the trivia and cite my paper thx
AudioLM = w2v-BERT + SoundStream
w2v-BERT is..
- a BERT, but for audio. originally for speech. in AudioLM, an intermediate layer from speech-pretrained model was used.
- it's "coarse" (250bps of bitrate.)
- it takes care of semantic information.
ByteDance/TikTok is hiring research scientists and software developers around music information retrieval and music/audio signal processing at Mountain View, US. Please hit me up!
#ismir2020
we're hiring AI/LLM engineers!
- covering both pre-training and post-training tasks
- purely for product development, based on *extensive understanding in LLMs*
- with real-world impacts on drug discovery in Genentech
- no publication within sight
Frrquency-aware CNNs. Ooops I was working on the same thing last summer but had no time after some experiments. It worked for music classification and source separation. Go try this!
We're looking for a junior-level MIR researcher (perhaps Master or PhD) in Shanghai; to work with me on music tagging and related problems. Expecting to hire ASAP. Please email me if you're interested!
It seems clear to me that Tensorflow developers are not deeply understanding why researchers struggle with their product. Life is too short for most of researchers to be very good at all Python and machine learning. TF adds another burden, but Pytorch doesn't.
in the training set, no text label is needed because we.. i mean, googlers.. have pre-trained MuLan!
also, if you believe the power neural codec, SoundStream, no need to trained end-to-end with waveforms etc! SoundStream tokens are good enough!
TikTok🎶 is hiring a research scientist in Music/ML @🇬🇧 London office 🔥 Join our SAMI team to work on Speech, Audio, and Music intelligence with us :)
Please feel free to reach out to me for any question 📧
inference is straightforward.
do the same with the training stage except
- use MuLan text model, because we want *text*-to-music.
- after SoundStream tokens are predicted, feed them to SS decoder to generated audio.
*QUITE A FEW* papers are accepted to
#ismir2021
from our team in ByteDance 🚀🚀🚀🚀🚀🚀🚀 I'll share more details once the proceedings are updated.
And yes we're hiring 🔥🔥🔥🔥🔥🔥🔥
Sheet Sage: Lead sheets from music audio
Leverage Jukebox for melody extraction.
Who'd submit this level of amazing work simply to late-breaking/demo session? This guy →
@chrisdonahuey
Long time no first-authoring! Listen, Read, and Identify network (LRID-Net) identifies singing language by reading the metadata (title, album, artist) and listening to the audio.
Our paper about DCASE Challenge T7 - Foley Sound Synthesis was accepted to the DCASE Workshop 🥳
I can't make it to Finland🇫🇮, but some of the authors will be there to tell you what we went through while organizing the first generative challenge at DCASE.
ByteDance 🚀 US Speech / Audio / Music research team is extensively hiring research scientists. If you’re a graduating PhD this year, don’t wait and just DM me! 🔥🔥
DawDreamer has gained many features recently including pip install. A new notebook shows how to load Ableton warp marker files like this video. Faust integration enables custom polyphonic instruments. Hopefully very useful for ML researchers and artists.
teaching "deep learning for media" at NYU was super fun! now, let me disseminate my students' final projects. these are really cool stuff.
they somehow made it in the vary last minute. i swear none of these was at this level just one week before 😂
anyways, 🧵 starts -
look how shamelessly i'm included here! as always, it was great to connect to all the great researchers in MACLab supervised by
@juhan_nam
at
@ISMIRConf
.
This year, people from the Music and Audio Computing Lab at KAIST, led by
@juhan_nam
, participated in the
@ISMIRConf
, and presented our work through scientific programs, late-breaking demos and music sessions!
DCASE Task 7 - Foley Sound Synthesis has finished. It was the very first generative audio AI challenge. I'm very happy to have organized such a successful event! 🎉
The longest ever video of me talking public has become public. "Deep Learning with Audio Signals: Prepare, Process, Design, Expect" in
@QConAI
. In case me tweeting around you isn't enough.
we're hiring AI/LLM engineers!
- covering both pre-training and post-training tasks
- purely for product development, based on *extensive understanding in LLMs*
- with real-world impacts on drug discovery in Genentech
- no publication within sight
The
#ismir2019
poster repo is hosting 25 posters and 38-starred now. Would you please 'Like' this tweet if you've ever been the repo and seen any posters there? I wanna know its impact. Thanks!
최근우, 김사무엘(Samuel Kim) 및 174명의 음향, 음성 과학자는 PD수첩에서 방영한 숭실대학교 배명진 교수 사태와 관련 다음의 성명서를 발표합니다. 이 성명서는 한국음향학회 및 숭실대학교 전자정보공학부 임원진과 교수님들께도 발송하였습니다. 감사합니다. 성명서:
Can't wait to share our new Text-to-Audio model, AudioLDM. 😆
This video shows the generation result with a simple text prompt: "A music made by xxx".
More demos coming soon!😉
The paper will be available next Monday on arXiv! 😊
Our model will be open-sourced soon!😎
two sides of making music.
(a) manufacturing music
(b) expressing creativity through music
i see prompting music Gen AI - to get the final, whole audio - purely as (a), which is totally fine as long as its training is done legally.
I've been an audio person for 10+ years. Let me tell you - you don't need 192/24 or anything. If you don't like the audio quality from any legit music streaming service, it's NOT about the codec. get a better connection, quieter place, better earbuds.