edfonseca_ Profile Banner
Eduardo Fonseca Profile
Eduardo Fonseca

@edfonseca_

Followers
1K
Following
1K
Media
23
Statuses
183

Research Scientist @GoogleDeepMind. Sound Understanding. Previously @GoogleAI and @mtg_upf. He/him.

NYC
Joined October 2017
Don't wanna be here? Send us removal request.
@edfonseca_
Eduardo Fonseca
2 months
🔊New paper! Recomposer allows editing sound events within complex scenes based on textual descriptions and event roll representations. And we discuss the details that matter! Work led by Dan Ellis w/ a bunch of Sound Understanding folks @GoogleDeepMind https://t.co/J6F57BqSMn
0
4
42
@vivek_kumar
Vivek Kumar
2 months
Excited to share our work from Sound Understanding team at @GoogleDeepMind! Ever wanted to remove a single cough from a recording or make a faint doorbell louder? Recomposer makes editing complex audio scenes possible! Paper: https://t.co/bd6lx5b938 #AudioEditing #GenerativeAI
Tweet card summary image
arxiv.org
Editing complex real-world sound scenes is difficult because individual sound sources overlap in time. Generative models can fill-in missing or corrupted details based on their strong prior...
@edfonseca_
Eduardo Fonseca
2 months
🔊New paper! Recomposer allows editing sound events within complex scenes based on textual descriptions and event roll representations. And we discuss the details that matter! Work led by Dan Ellis w/ a bunch of Sound Understanding folks @GoogleDeepMind https://t.co/J6F57BqSMn
1
5
24
@ArxivSound
arXiv Sound
2 months
Daniel P. W. Ellis, Eduardo Fonseca, Ron J. Weiss, Kevin Wilson, Scott Wisdom, Hakan Erdogan, John R. Hershey, Aren Jansen, R. Channing Moore, Manoj Plakal, "Recomposer: Event-roll-guided generative audio editing,"
Tweet card summary image
arxiv.org
Editing complex real-world sound scenes is difficult because individual sound sources overlap in time. Generative models can fill-in missing or corrupted details based on their strong prior...
0
7
23
@yuma_koizumi
Yuma Koizumi
7 months
New multilingual speech restoration paper out Miipher-2 🚀! The RTF on a TPU is 0.0078: 1 million hours of data can be cleaned in 3 days using just 100 TPUs! Paper:  https://t.co/lohyU54t4a Demo:  https://t.co/LQuVMgJChJ
Tweet card summary image
arxiv.org
Training data cleaning is a new application for generative model-based speech restoration (SR). This paper introduces Miipher-2, an SR model designed for million-hour scale data, for training data...
3
31
84
@AnnamariaMsros
Annamaria Mesaros
1 year
I'm looking for a PhD student to work on continual learning for audio. Funding available for 2 years to start with, to be extended to 4 later. Contact me through email if interested! If you participated to @DCASE_Challenge or are coming to @DCASE_Workshop even better!
3
32
113
@edfonseca_
Eduardo Fonseca
2 years
🔊 We've released pre-trained models & code for our ICCV23 paper, Audiovisual Masked Autoencoders!! GitHub:  https://t.co/NDbPZgefCo Paper:  https://t.co/rtLhWOq872 Work led by Lili Georgescu and @anuragarnab , with Radu Ionescu, @MarioLucic_ and @CordeliaSchmid
Tweet card summary image
arxiv.org
Can we leverage the audiovisual information already present in video to improve self-supervised representation learning? To answer this question, we study various pretraining architectures and...
0
5
46
@vivek_kumar
Vivek Kumar
2 years
It's so awesome to see the impact of the computational audio capabilities we developed featured in @madebygoogle 🎉 🎉 🎉 Congrats to John Hershey, @ScottTWisdom, @PGetreuer & everyone who contributed for pioneering new computational audio capabilities in Pixel8 #MadeByGoogle
@googlephotos
Google Photos
2 years
Check out the 4 new Google Photos features coming first to Pixel 8 and 8 Pro ↓ Whether it’s noise from wind, traffic, or barking dogs, Audio Magic Eraser in Google Photos reduces distracting sounds in your video in just a few taps! 🪄
4
16
60
@edfonseca_
Eduardo Fonseca
2 years
🔊New paper out: Do you use data balancing in your AudioSet experiments? It gets you a little mAP boost? It might work differently than you think...😅 You might want to check our last paper, led by @ChannningMoore
@ChannningMoore
Channing Moore
2 years
Our 2023 ICASSP paper is now up on arXiv: Dataset balancing can hurt model performance https://t.co/TyawmB1OWK Dataset balancing works differently than you might assume: - can cause overfitting; - doesn’t improve performance on rare classes; - speeds up training convergence.
0
1
16
@edfonseca_
Eduardo Fonseca
2 years
Not sure my reviews were that "outstanding"... 😅, but the recognition is nice...  Thanks to the @ieeeICASSP committee.  #ICASSP2023
0
0
28
@AnnamariaMsros
Annamaria Mesaros
3 years
📢 The countdown to DCASE Challenge 2023 deadline is on! 🗓️ Deadlines: * System submission: 15/5 - 23.59 AoE * Technical reports: 22/5 - 23.59 AoE
0
2
6
@RSerizel
Romain Serizel
3 years
1
4
18
@edfonseca_
Eduardo Fonseca
3 years
Aaand that was after defending my PhD thesis some months ago "Training Sound Event Classifiers Using Different Types of Supervision" & taking some time off :) Thesis/video/slides & a quick summary available here: https://t.co/HrmsUURYv9 SUPER thankful to all @mtg_upf folks!!🙌
0
0
14
@edfonseca_
Eduardo Fonseca
3 years
🔊 A bit late, but happy to announce that I recently joined Google Research! I’m working in the Sound Understanding Group based out of NYC! https://t.co/kbD1U0okFc
9
7
158
@hearbenchmark
HEAR Benchmark
3 years
HEAR PMLR journal submissions are open until 2022-06-30. https://t.co/URhf1PPgrY Besides that, people have asked if they can run HEAR benchmarks, get on the leaderboard, cite us in the future. Yes! HEAR is here to stay. See our updated website:  https://t.co/VuSnPYF095
0
5
17
@RSerizel
Romain Serizel
3 years
📣 We have a (super cool) PhD position in speech enhancement for patients with auditory neuropathy spectrum disorders. If you're interested in Audio Signal Processing/MAchine Learning/Audiology, contact us! More info ⤵️ https://t.co/VVXWsIz4Fb
1
10
14
@RSerizel
Romain Serizel
4 years
Looking forward to seeing you all here! 🥳
@DCASE_Workshop
DCASE Workshop
4 years
📢The #DCASE2022 workshop call for papers is out 🥳🎉 https://t.co/2rU1IzJkMG The abstract submission deadline is on 7th of July and the workshop will be held in presence from 3rd to 4th of November in Nancy. Looking forward to see you there! 😉
0
1
2
@nizumical
daisukelab
4 years
Our new paper is out! We explored a simple masked patch modeling w/o augmentation to learn a latent that describes the input spectrogram as it is. “Masked Spectrogram Modeling using Masked Autoencoders for Learning General-purpose Audio Representation” https://t.co/kWIEMsGzNZ
Tweet card summary image
arxiv.org
Recent general-purpose audio representations show state-of-the-art performance on various audio tasks. These representations are pre-trained by self-supervised learning methods that create...
2
2
29
@nizumical
daisukelab
4 years
https://t.co/voGOpmu0pe For BYOL for Audio, an updated paper is out (submitted last year, still under review). It extends the initial BYOL-A for network architecture and data augmentation. We compare with 8 models (11 representations) using a benchmark with 10 tasks.
Tweet card summary image
arxiv.org
Pre-trained models are essential as feature extractors in modern machine learning systems in various domains. In this study, we hypothesize that representations effective for general audio tasks...
@ArxivSound
arXiv Sound
4 years
``BYOL for Audio: Exploring Pre-trained General-purpose Audio Representations. (arXiv:2204.07402v1 [ https://t.co/3pcQCkeyAA]),'' Daisuke Niizumi, Daiki Takeuchi, Yasunori Ohishi, Noboru Harada, Kunio Kashino,
1
1
8
@AnnamariaMsros
Annamaria Mesaros
4 years
#DCASE2022 Challenge is officially open! You can now check the task descriptions and development data. Some tasks have delays with the baseline system, but those will be ready soon too. https://t.co/0aOOLZIt9B @DCASE_Challenge #machinelistening #DCASE
0
11
21
@DCASE_Challenge
DCASE Challenge
4 years
📢 DCASE challenge 2022 task descriptions are out!! Enjoy ➡️
0
12
22