聆听沉默的声音言语denoising

论文标题

聆听沉默的声音言语denoising

Listening to Sounds of Silence for Speech Denoising

论文作者

Xu, Ruilin, Wu, Rundi, Ishiwaka, Yuko, Vondrick, Carl, Zheng, Changxi

论文摘要

我们介绍了一种深度学习模型，用于语音denoising，这是在许多应用中引起的音频分析中的长期挑战。我们的方法基于对人类语音的关键观察：每个句子或单词之间通常会短暂停顿。在记录的语音信号中，这些暂停引入了一系列时间段，在此期间仅存在噪声。我们利用这些偶然的无声间隔来学习一个仅鉴于单渠道音频的自动语音降级模型。随着时间的流逝，检测到的无声间隔不仅暴露了纯噪声，还暴露其时间变化的特征，从而使模型可以学习噪声动态并从语音信号中抑制它。多个数据集上的实验证实了无声间隔检测在语音denoising中的关键作用，我们的方法的表现优于几种最先进的denoising方法，包括那些仅接受音频输入（如我们的）和基于听觉的音频输入的方法的方法，以及那些基于听觉的输入的方法（因此需要更多信息）。我们还表明，我们的方法具有出色的概括属性，例如在培训期间看不到的语言。

We introduce a deep learning model for speech denoising, a long-standing challenge in audio analysis arising in numerous applications. Our approach is based on a key observation about human speech: there is often a short pause between each sentence or word. In a recorded speech signal, those pauses introduce a series of time periods during which only noise is present. We leverage these incidental silent intervals to learn a model for automatic speech denoising given only mono-channel audio. Detected silent intervals over time expose not just pure noise but its time-varying features, allowing the model to learn noise dynamics and suppress it from the speech signal. Experiments on multiple datasets confirm the pivotal role of silent interval detection for speech denoising, and our method outperforms several state-of-the-art denoising methods, including those that accept only audio input (like ours) and those that denoise based on audiovisual input (and hence require more information). We also show that our method enjoys excellent generalization properties, such as denoising spoken languages not seen during training.

下载PDF全文

下载文献需遵守相关版权规定

论文标题