论文标题
使用软性自我注意来解释的过滤器学习原始波形语音识别
Interpretable Filter Learning Using Soft Self-attention For Raw Waveform Speech Recognition
论文作者
论文摘要
来自原始波形的语音识别涉及学习使用卷积层在神经声学模型的第一层中信号的光谱分解。在这项工作中,我们提出了一种使用软性自我注意的原始波形卷积滤波器学习方法。提出的模型中的声学滤波器库是使用参数的参数进行了参数的。网络中的网络体系结构提供了自我关注,可以在子频段过滤器上产生注意力权重。注意加权的日志滤网库能量被馈送到声学模型中,以完成语音识别的任务。实验是在Aurora-4(带有通道伪像的添加噪声)和Chime-3(带回响的添加噪声)数据库上进行的。在这些实验中,基于注意力的滤波器学习方法比基线MEL滤波器库特征和其他强大前端提供了大量改进(平均相对相对相对提高了7%的单词错误率比Aurora-4数据集的基线特征,而Chime-3数据库的平均相对相对提高了基线功能,而基线功能的平均相对相对提高了5%)。使用自发权重,我们还对ASR任务过滤器的解释性进行了分析。
Speech recognition from raw waveform involves learning the spectral decomposition of the signal in the first layer of the neural acoustic model using a convolution layer. In this work, we propose a raw waveform convolutional filter learning approach using soft self-attention. The acoustic filter bank in the proposed model is implemented using a parametric cosine-modulated Gaussian filter bank whose parameters are learned. A network-in-network architecture provides self-attention to generate attention weights over the sub-band filters. The attention weighted log filter bank energies are fed to the acoustic model for the task of speech recognition. Experiments are conducted on Aurora-4 (additive noise with channel artifact), and CHiME-3 (additive noise with reverberation) databases. In these experiments, the attention based filter learning approach provides considerable improvements in ASR performance over the baseline mel filter-bank features and other robust front-ends (average relative improvement of 7% in word error rate over baseline features on Aurora-4 dataset, and 5% on CHiME-3 database). Using the self-attention weights, we also present an analysis on the interpretability of the filters for the ASR task.