Eduardo Fonseca
@edfonseca_
Followers
1K
Following
1K
Media
23
Statuses
183
Research Scientist @GoogleDeepMind. Sound Understanding. Previously @GoogleAI and @mtg_upf. He/him.
NYC
Joined October 2017
🔊New paper! Recomposer allows editing sound events within complex scenes based on textual descriptions and event roll representations. And we discuss the details that matter! Work led by Dan Ellis w/ a bunch of Sound Understanding folks @GoogleDeepMind
https://t.co/J6F57BqSMn
0
4
42
Excited to share our work from Sound Understanding team at @GoogleDeepMind! Ever wanted to remove a single cough from a recording or make a faint doorbell louder? Recomposer makes editing complex audio scenes possible! Paper: https://t.co/bd6lx5b938
#AudioEditing #GenerativeAI
arxiv.org
Editing complex real-world sound scenes is difficult because individual sound sources overlap in time. Generative models can fill-in missing or corrupted details based on their strong prior...
🔊New paper! Recomposer allows editing sound events within complex scenes based on textual descriptions and event roll representations. And we discuss the details that matter! Work led by Dan Ellis w/ a bunch of Sound Understanding folks @GoogleDeepMind
https://t.co/J6F57BqSMn
1
5
24
Daniel P. W. Ellis, Eduardo Fonseca, Ron J. Weiss, Kevin Wilson, Scott Wisdom, Hakan Erdogan, John R. Hershey, Aren Jansen, R. Channing Moore, Manoj Plakal, "Recomposer: Event-roll-guided generative audio editing,"
arxiv.org
Editing complex real-world sound scenes is difficult because individual sound sources overlap in time. Generative models can fill-in missing or corrupted details based on their strong prior...
0
7
23
New multilingual speech restoration paper out Miipher-2 🚀! The RTF on a TPU is 0.0078: 1 million hours of data can be cleaned in 3 days using just 100 TPUs! Paper: https://t.co/lohyU54t4a Demo: https://t.co/LQuVMgJChJ
arxiv.org
Training data cleaning is a new application for generative model-based speech restoration (SR). This paper introduces Miipher-2, an SR model designed for million-hour scale data, for training data...
3
31
84
I'm looking for a PhD student to work on continual learning for audio. Funding available for 2 years to start with, to be extended to 4 later. Contact me through email if interested! If you participated to @DCASE_Challenge or are coming to @DCASE_Workshop even better!
3
32
113
🔊 We've released pre-trained models & code for our ICCV23 paper, Audiovisual Masked Autoencoders!! GitHub: https://t.co/NDbPZgefCo Paper: https://t.co/rtLhWOq872 Work led by Lili Georgescu and @anuragarnab , with Radu Ionescu, @MarioLucic_ and @CordeliaSchmid
arxiv.org
Can we leverage the audiovisual information already present in video to improve self-supervised representation learning? To answer this question, we study various pretraining architectures and...
0
5
46
It's so awesome to see the impact of the computational audio capabilities we developed featured in @madebygoogle 🎉 🎉 🎉 Congrats to John Hershey, @ScottTWisdom, @PGetreuer & everyone who contributed for pioneering new computational audio capabilities in Pixel8 #MadeByGoogle
Check out the 4 new Google Photos features coming first to Pixel 8 and 8 Pro ↓ Whether it’s noise from wind, traffic, or barking dogs, Audio Magic Eraser in Google Photos reduces distracting sounds in your video in just a few taps! 🪄
4
16
60
🔊New paper out: Do you use data balancing in your AudioSet experiments? It gets you a little mAP boost? It might work differently than you think...😅 You might want to check our last paper, led by @ChannningMoore
Our 2023 ICASSP paper is now up on arXiv: Dataset balancing can hurt model performance https://t.co/TyawmB1OWK Dataset balancing works differently than you might assume: - can cause overfitting; - doesn’t improve performance on rare classes; - speeds up training convergence.
0
1
16
Not sure my reviews were that "outstanding"... 😅, but the recognition is nice... Thanks to the @ieeeICASSP committee. #ICASSP2023
0
0
28
📢 The countdown to DCASE Challenge 2023 deadline is on! 🗓️ Deadlines: * System submission: 15/5 - 23.59 AoE * Technical reports: 22/5 - 23.59 AoE
0
2
6
📣 The results for @DCASE_Challenge task 4 are finally out! 🥳 https://t.co/KSgFwNq4K0
@Nicoturpo @FraRonchini @edfonseca_ @SamueleCornell
1
4
18
Aaand that was after defending my PhD thesis some months ago "Training Sound Event Classifiers Using Different Types of Supervision" & taking some time off :) Thesis/video/slides & a quick summary available here: https://t.co/HrmsUURYv9 SUPER thankful to all @mtg_upf folks!!🙌
0
0
14
🔊 A bit late, but happy to announce that I recently joined Google Research! I’m working in the Sound Understanding Group based out of NYC! https://t.co/kbD1U0okFc
9
7
158
HEAR PMLR journal submissions are open until 2022-06-30. https://t.co/URhf1PPgrY Besides that, people have asked if they can run HEAR benchmarks, get on the leaderboard, cite us in the future. Yes! HEAR is here to stay. See our updated website: https://t.co/VuSnPYF095
0
5
17
📣 We have a (super cool) PhD position in speech enhancement for patients with auditory neuropathy spectrum disorders. If you're interested in Audio Signal Processing/MAchine Learning/Audiology, contact us! More info ⤵️ https://t.co/VVXWsIz4Fb
1
10
14
Looking forward to seeing you all here! 🥳
📢The #DCASE2022 workshop call for papers is out 🥳🎉 https://t.co/2rU1IzJkMG The abstract submission deadline is on 7th of July and the workshop will be held in presence from 3rd to 4th of November in Nancy. Looking forward to see you there! 😉
0
1
2
Our new paper is out! We explored a simple masked patch modeling w/o augmentation to learn a latent that describes the input spectrogram as it is. “Masked Spectrogram Modeling using Masked Autoencoders for Learning General-purpose Audio Representation” https://t.co/kWIEMsGzNZ
arxiv.org
Recent general-purpose audio representations show state-of-the-art performance on various audio tasks. These representations are pre-trained by self-supervised learning methods that create...
2
2
29
https://t.co/voGOpmu0pe For BYOL for Audio, an updated paper is out (submitted last year, still under review). It extends the initial BYOL-A for network architecture and data augmentation. We compare with 8 models (11 representations) using a benchmark with 10 tasks.
arxiv.org
Pre-trained models are essential as feature extractors in modern machine learning systems in various domains. In this study, we hypothesize that representations effective for general audio tasks...
``BYOL for Audio: Exploring Pre-trained General-purpose Audio Representations. (arXiv:2204.07402v1 [ https://t.co/3pcQCkeyAA]),'' Daisuke Niizumi, Daiki Takeuchi, Yasunori Ohishi, Noboru Harada, Kunio Kashino,
1
1
8
#DCASE2022 Challenge is officially open! You can now check the task descriptions and development data. Some tasks have delays with the baseline system, but those will be ready soon too. https://t.co/0aOOLZIt9B
@DCASE_Challenge #machinelistening #DCASE
0
11
21
📢 DCASE challenge 2022 task descriptions are out!! Enjoy ➡️
0
12
22