arXiv Sound Profile
arXiv Sound

@ArxivSound

Followers
6K
Following
1
Media
0
Statuses
17K

Sound-related articles (https://t.co/dxVYgWJGOw and https://t.co/b90N0Zzvjs) on https://t.co/HHqPequzVU

Joined July 2020
Don't wanna be here? Send us removal request.
@ArxivSound
arXiv Sound
3 years
[IMPORTANT] arXiv sound does not post some papers submitted to arXiv https://t.co/mPAjntoGrG or https://t.co/3pcQCkf6q8. This is because they do not appear in the RSS of arXiv. We apologize for your inconvenience.
1
0
9
@ArxivSound
arXiv Sound
5 hours
Friedrich Wolf-Monheim, "Spectral and Rhythm Feature Performance Evaluation for Category and Class Level Audio Classification with Deep Convolutional Neural Networks,"
Tweet card summary image
arxiv.org
Next to decision tree and k-nearest neighbours algorithms deep convolutional neural networks (CNNs) are widely used to classify audio data in many domains like music, speech or environmental...
0
0
2
@ArxivSound
arXiv Sound
5 hours
Paolo Combes, Stefan Weinzierl, Klaus Obermayer, "Neural Proxies for Sound Synthesizers: Learning Perceptually Informed Preset Representations,"
Tweet card summary image
arxiv.org
Deep learning appears as an appealing solution for Automatic Synthesizer Programming (ASP), which aims to assist musicians and sound designers in programming sound synthesizers. However,...
0
0
0
@ArxivSound
arXiv Sound
5 hours
Patricia Hu, Silvan David Peter, Jan Schl\"uter, Gerhard Widmer, "Exploring System Adaptations For Minimum Latency Real-Time Piano Transcription,"
Tweet card summary image
arxiv.org
Advances in neural network design and the availability of large-scale labeled datasets have driven major improvements in piano transcription. Existing approaches target either offline...
0
0
0
@ArxivSound
arXiv Sound
5 hours
Ye Ni, Ruiyu Liang, Xiaoshuai Hao, Jiaming Cheng, Qingyun Wang, Chengwei Huang, Cairong Zou, Wei Zhou, Weiping Ding, Bj\"orn W. Schuller, "Affine Modulation-based Audiogram Fusion Network for Joint Noise Reduction and Hearing Loss Compensation,"
Tweet card summary image
arxiv.org
Hearing aids (HAs) are widely used to provide personalized speech enhancement (PSE) services, improving the quality of life for individuals with hearing loss. However, HA performance significantly...
0
0
1
@ArxivSound
arXiv Sound
5 hours
William Chen, Chutong Meng, et al., "The ML-SUPERB 2.0 Challenge: Towards Inclusive ASR Benchmarking for All Language Varieties,",
Tweet card summary image
arxiv.org
Recent improvements in multilingual ASR have not been equally distributed across languages and language varieties. To advance state-of-the-art (SOTA) ASR models, we present the Interspeech 2025...
0
2
1
@ArxivSound
arXiv Sound
5 hours
Huihong Liang, Dongxuan Jia, Youquan Wang, Longtao Huang, Shida Zhong, Luping Xiang, Lei Huang, Tao Yuan, "Prototype: A Keyword Spotting-Based Intelligent Audio SoC for IoT,"
Tweet card summary image
arxiv.org
In this demo, we present a compact intelligent audio system-on-chip (SoC) integrated with a keyword spotting accelerator, enabling ultra-low latency, low-power, and low-cost voice interaction in...
0
0
0
@ArxivSound
arXiv Sound
7 hours
Ganghui Ru, Jieying Wang, Jiahao Zhao, Yulun Wu, Yi Yu, Nannan Jiang, Wei Wang, Wei Li, "HingeNet: A Harmonic-Aware Fine-Tuning Approach for Beat Tracking,"
Tweet card summary image
arxiv.org
Fine-tuning pre-trained foundation models has made significant progress in music information retrieval. However, applying these models to beat tracking tasks remains unexplored as the limited...
0
0
0
@ArxivSound
arXiv Sound
7 hours
Thomas Thebaud, Yen-Ju Lu, Matthew Wiesner, Peter Viechnicki, Najim Dehak, "Enhancing Dialogue Annotation with Speaker Characteristics Leveraging a Frozen LLM,"
Tweet card summary image
arxiv.org
In dialogue transcription pipelines, Large Language Models (LLMs) are frequently employed in post-processing to improve grammar, punctuation, and readability. We explore a complementary...
0
0
0
@ArxivSound
arXiv Sound
7 hours
Luxi He, Xiangyu Qi, Michel Liao, Inyoung Cheong, Prateek Mittal, Danqi Chen, Peter Henderson, "The Model Hears You: Audio Language Model Deployments Should Consider the Principle of Least Privilege,"
Tweet card summary image
arxiv.org
We are at a turning point for language models that accept audio input. The latest end-to-end audio language models (Audio LMs) process speech directly instead of relying on a separate...
0
1
1
@ArxivSound
arXiv Sound
7 hours
Zhengdong Yang, Shuichiro Shimizu, Yahan Yu, Chenhui Chu, "When Large Language Models Meet Speech: A Survey on Integration Approaches,"
Tweet card summary image
arxiv.org
Recent advancements in large language models (LLMs) have spurred interest in expanding their application beyond text-based tasks. A large number of studies have explored integrating other...
0
1
1
@ArxivSound
arXiv Sound
7 hours
Pengyu Wang, Ying Fang, Xiaofei Li, "VINP: Variational Bayesian Inference with Neural Speech Prior for Joint ASR-Effective Speech Dereverberation and Blind RIR Identification,"
Tweet card summary image
arxiv.org
Reverberant speech, denoting the speech signal degraded by reverberation, contains crucial knowledge of both anechoic source speech and room impulse response (RIR). This work proposes a...
0
0
0
@ArxivSound
arXiv Sound
1 day
Davide Berghi, Philip J. B. Jackson, "Integrating Spatial and Semantic Embeddings for Stereo Sound Event Localization in Videos,"
Tweet card summary image
arxiv.org
In this study, we address the multimodal task of stereo sound event localization and detection with source distance estimation (3D SELD) in regular video content. 3D SELD is a complex task that...
0
0
0
@ArxivSound
arXiv Sound
1 day
Liping Chen, Kong Aik Lee, Zhen-Hua Ling, Xin Wang, Rohan Kumar Das, Tomoki Toda, Haizhou Li, "Speaker Privacy and Security in the Big Data Era: Protection and Defense against Deepfake,"
Tweet card summary image
arxiv.org
In the era of big data, remarkable advancements have been achieved in personalized speech generation techniques that utilize speaker attributes, including voice and speaking style, to generate...
0
0
0
@ArxivSound
arXiv Sound
1 day
Marvin Lavechin, Thomas Hueber, "From perception to production: how acoustic invariance facilitates articulatory learning in a self-supervised vocal imitation model,"
Tweet card summary image
arxiv.org
Human infants face a formidable challenge in speech acquisition: mapping extremely variable acoustic inputs into appropriate articulatory movements without explicit instruction. We present a...
0
0
0