图形同构网络具有加权多个聚合器以进行语音情感识别

论文标题

图形同构网络具有加权多个聚合器以进行语音情感识别

A Graph Isomorphism Network with Weighted Multiple Aggregators for Speech Emotion Recognition

论文作者

Hu, Ying, Tang, Yuwu, Huang, Hao, He, Liang

论文摘要

言语情感识别（SER）是人类互动的重要组成部分。在本文中，我们提出了一个基于图形同构网络的SER网络，该网络具有加权多个聚合器（WMA-GIN），当邻居节点的特征在杜松子酒结构中聚集在一起时，该网络可以有效地解决信息困惑的问题。此外，采用了一个全贴剂（FA）层来减轻过度方面的问题，该问题都存在于包括杜松子酒在内的所有图神经网络（GNN）结构中。此外，采用多相注意机制和多损失训练策略来避免缺少堆叠的WMA轴层中有用的情感信息。我们评估了我们在流行的Iemocap数据集上提出的WMA-GIN的性能。实验结果表明，WMA-GIN的表现优于其他基于GNN的方法，并且可以通过达到72.48％的加权准确度（WA）和未加权准确性（UA）的67.72％的基于GNN的方法。

Speech emotion recognition (SER) is an essential part of human-computer interaction. In this paper, we propose an SER network based on a Graph Isomorphism Network with Weighted Multiple Aggregators (WMA-GIN), which can effectively handle the problem of information confusion when neighbour nodes' features are aggregated together in GIN structure. Moreover, a Full-Adjacent (FA) layer is adopted for alleviating the over-squashing problem, which is existed in all Graph Neural Network (GNN) structures, including GIN. Furthermore, a multi-phase attention mechanism and multi-loss training strategy are employed to avoid missing the useful emotional information in the stacked WMA-GIN layers. We evaluated the performance of our proposed WMA-GIN on the popular IEMOCAP dataset. The experimental results show that WMA-GIN outperforms other GNN-based methods and is comparable to some advanced non-graph-based methods by achieving 72.48% of weighted accuracy (WA) and 67.72% of unweighted accuracy (UA).

下载PDF全文

下载文献需遵守相关版权规定

论文标题