论文标题

序列自动编码器的注意力和顺序的新型融合,以预测语音的嗜睡

A Novel Fusion of Attention and Sequence to Sequence Autoencoders to Predict Sleepiness From Speech

论文作者

Amiriparian, Shahin, Winokurow, Pawel, Karas, Vincent, Ottl, Sandra, Gerczuk, Maurice, Schuller, Björn W.

论文摘要

受到人类视觉系统的注意机制以及机器翻译领域的最新发展的激励,我们将基于注意力的和经常性的顺序引入了序列自动编码器,以从音频文件中完全无监督的表示形式学习。特别是,我们测试了新方法对基于语音的嗜睡识别任务的功效。我们从两个自动编码器中评估学习的表示形式,然后进行早期融合以确定它们之间可能的互补性。在我们的框架中,我们首先从原始音频文件中提取MEL光谱图。其次,我们在这些频谱图上训练复发自动编码器,这些自动编码器被认为是时间依赖的频率向量。之后,我们提取自动编码器的特定完全连接层的激活,这些激活代表相应的音频实例的频谱图的学习特征。最后,我们在这些表示方面训练支持向量回归器以获得预测。关于数据的开发分区,我们通过利用注意力和非注意自动编码器以及两位自动编码器的融合分别实现了Spearman的相关系数为.324,.283和.320与Karolinska嗜睡量表的目标。按照相同的顺序,我们实现了.311,.359和.367 Spearman在测试数据上的相关系数,这表明我们提出的融合策略的适用性。

Motivated by the attention mechanism of the human visual system and recent developments in the field of machine translation, we introduce our attention-based and recurrent sequence to sequence autoencoders for fully unsupervised representation learning from audio files. In particular, we test the efficacy of our novel approach on the task of speech-based sleepiness recognition. We evaluate the learnt representations from both autoencoders, and then conduct an early fusion to ascertain possible complementarity between them. In our frameworks, we first extract Mel-spectrograms from raw audio files. Second, we train recurrent autoencoders on these spectrograms which are considered as time-dependent frequency vectors. Afterwards, we extract the activations of specific fully connected layers of the autoencoders which represent the learnt features of spectrograms for the corresponding audio instances. Finally, we train support vector regressors on these representations to obtain the predictions. On the development partition of the data, we achieve Spearman's correlation coefficients of .324, .283, and .320 with the targets on the Karolinska Sleepiness Scale by utilising attention and non-attention autoencoders, and the fusion of both autoencoders' representations, respectively. In the same order, we achieve .311, .359, and .367 Spearman's correlation coefficients on the test data, indicating the suitability of our proposed fusion strategy.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源