分析基于DNN的语音增强的嘈杂目标培训

论文标题

分析基于DNN的语音增强的嘈杂目标培训

Analysis of Noisy-target Training for DNN-based speech enhancement

论文作者

Fujimura, Takuya, Toda, Tomoki

论文摘要

基于深度神经网络（DNN）的语音增强通常使用干净的语音作为训练目标。但是，很难收集大量干净的语音，因为录制非常昂贵。换句话说，当前语音增强的性能受到培训数据量的限制。为了放松这一限制，已经提出了利用嘈杂语音作为训练目标的嘈杂目标培训（NYTT）。尽管已经通过实验表明，NYTT可以在没有干净的语音的情况下训练DNN，但尚未进行详细的分析，其行为尚未得到很好的理解。在本文中，我们进行了各种分析，以加深对NYTT的理解。此外，基于NYTT的属性，我们提出了一种精致的方法，该方法与使用干净的语音相当。此外，我们表明我们可以通过使用大量嘈杂的语音和干净的语音来提高性能。

Deep neural network (DNN)-based speech enhancement usually uses a clean speech as a training target. However, it is hard to collect large amounts of clean speech because the recording is very costly. In other words, the performance of current speech enhancement has been limited by the amount of training data. To relax this limitation, Noisy-target Training (NyTT) that utilizes noisy speech as a training target has been proposed. Although it has been experimentally shown that NyTT can train a DNN without clean speech, a detailed analysis has not been conducted and its behavior has not been understood well. In this paper, we conduct various analyses to deepen our understanding of NyTT. In addition, based on the property of NyTT, we propose a refined method that is comparable to the method using clean speech. Furthermore, we show that we can improve the performance by using a huge amount of noisy speech with clean speech.

下载PDF全文

下载文献需遵守相关版权规定

论文标题