论文标题
图形同构网络具有加权多个聚合器以进行语音情感识别
A Graph Isomorphism Network with Weighted Multiple Aggregators for Speech Emotion Recognition
论文作者
论文摘要
言语情感识别(SER)是人类互动的重要组成部分。在本文中,我们提出了一个基于图形同构网络的SER网络,该网络具有加权多个聚合器(WMA-GIN),当邻居节点的特征在杜松子酒结构中聚集在一起时,该网络可以有效地解决信息困惑的问题。此外,采用了一个全贴剂(FA)层来减轻过度方面的问题,该问题都存在于包括杜松子酒在内的所有图神经网络(GNN)结构中。此外,采用多相注意机制和多损失训练策略来避免缺少堆叠的WMA轴层中有用的情感信息。我们评估了我们在流行的Iemocap数据集上提出的WMA-GIN的性能。实验结果表明,WMA-GIN的表现优于其他基于GNN的方法,并且可以通过达到72.48%的加权准确度(WA)和未加权准确性(UA)的67.72%的基于GNN的方法。
Speech emotion recognition (SER) is an essential part of human-computer interaction. In this paper, we propose an SER network based on a Graph Isomorphism Network with Weighted Multiple Aggregators (WMA-GIN), which can effectively handle the problem of information confusion when neighbour nodes' features are aggregated together in GIN structure. Moreover, a Full-Adjacent (FA) layer is adopted for alleviating the over-squashing problem, which is existed in all Graph Neural Network (GNN) structures, including GIN. Furthermore, a multi-phase attention mechanism and multi-loss training strategy are employed to avoid missing the useful emotional information in the stacked WMA-GIN layers. We evaluated the performance of our proposed WMA-GIN on the popular IEMOCAP dataset. The experimental results show that WMA-GIN outperforms other GNN-based methods and is comparable to some advanced non-graph-based methods by achieving 72.48% of weighted accuracy (WA) and 67.72% of unweighted accuracy (UA).