用于改进神经网络培训的小能量掩蔽，以进行端到端语音识别

论文标题

用于改进神经网络培训的小能量掩蔽，以进行端到端语音识别

Small energy masking for improved neural network training for end-to-end speech recognition

论文作者

Kim, Chanwoo, Kim, Kwangyoun, Indurthi, Sathish Reddy

论文摘要

在本文中，我们提出了一个小的能量掩蔽算法，该算法掩盖了其具有以下值以下的值的输入。更具体地说，如果该垃圾箱中的滤纸能量小于一定的能量阈值，则掩盖了时频箱。采用统一分布来随机生成该能阈值与分贝每种话语的峰值滤清器能量的比率。缩放未掩盖的特征元素，以便通过此掩蔽过程保持特征值的总和保持不变。这种非常简单的算法在标准的Librispeech测试清洁和基线端到端语音识别系统上相对11.2％和13.5％的单词错误率（WER）提高。此外，与输入辍学算法相比，SEM算法在同一LibrisPeech测试清洁和测试中的算法相对较高7.7％和11.6％。使用Transformer LM的修改后的浅融合技术，我们在Librispeech测试清洁设置上获得了2.62％的WER，并且在Librispeech测试中获得了7.87％的WER。

In this paper, we present a Small Energy Masking (SEM) algorithm, which masks inputs having values below a certain threshold. More specifically, a time-frequency bin is masked if the filterbank energy in this bin is less than a certain energy threshold. A uniform distribution is employed to randomly generate the ratio of this energy threshold to the peak filterbank energy of each utterance in decibels. The unmasked feature elements are scaled so that the total sum of the feature values remain the same through this masking procedure. This very simple algorithm shows relatively 11.2 % and 13.5 % Word Error Rate (WER) improvements on the standard LibriSpeech test-clean and test-other sets over the baseline end-to-end speech recognition system. Additionally, compared to the input dropout algorithm, SEM algorithm shows relatively 7.7 % and 11.6 % improvements on the same LibriSpeech test-clean and test-other sets. With a modified shallow-fusion technique with a Transformer LM, we obtained a 2.62 % WER on the LibriSpeech test-clean set and a 7.87 % WER on the LibriSpeech test-other set.

下载PDF全文

下载文献需遵守相关版权规定

论文标题