改善具有长度扰动和基于N最佳标签平滑的深度神经网络声学模型的概括

论文标题

改善具有长度扰动和基于N最佳标签平滑的深度神经网络声学模型的概括

Improving Generalization of Deep Neural Network Acoustic Models with Length Perturbation and N-best Based Label Smoothing

论文作者

Cui, Xiaodong, Saon, George, Nagano, Tohru, Suzuki, Masayuki, Fukuda, Takashi, Kingsbury, Brian, Kurata, Gakuto

论文摘要

我们介绍了两种技术：长度扰动和基于N最佳的标签平滑，以改善深度神经网络（DNN）声学模型的概括，以自动语音识别（ASR）。长度扰动是一种数据增强算法，它随机掉落并插入发音框架以更改语音特征序列的长度。基于n-最佳的标签平滑在训练过程中随机向地面真相标签注入噪声，以避免过度拟合，而噪声标签是由n-pess假设产生的。我们使用复发性神经网络传感器（RNNT）声学模型，在300小时的总机（SWB300）数据集（SWB300）数据集上广泛评估这两种技术。我们表明，这两种技术都会分别改善RNNT模型的概括，也可以是互补的。特别是，它们对强大的SWB300基线产生了良好的改进，并使用RNNT型号在SWB300上提供了最先进的性能。

We introduce two techniques, length perturbation and n-best based label smoothing, to improve generalization of deep neural network (DNN) acoustic models for automatic speech recognition (ASR). Length perturbation is a data augmentation algorithm that randomly drops and inserts frames of an utterance to alter the length of the speech feature sequence. N-best based label smoothing randomly injects noise to ground truth labels during training in order to avoid overfitting, where the noisy labels are generated from n-best hypotheses. We evaluate these two techniques extensively on the 300-hour Switchboard (SWB300) dataset and an in-house 500-hour Japanese (JPN500) dataset using recurrent neural network transducer (RNNT) acoustic models for ASR. We show that both techniques improve the generalization of RNNT models individually and they can also be complementary. In particular, they yield good improvements over a strong SWB300 baseline and give state-of-art performance on SWB300 using RNNT models.

下载PDF全文

下载文献需遵守相关版权规定

论文标题