论文标题
GA2MIF:基于图形和注意力的两阶段多源信息融合,用于对话情感检测
GA2MIF: Graph and Attention Based Two-Stage Multi-Source Information Fusion for Conversational Emotion Detection
论文作者
论文摘要
对话中的多模式情绪识别(ERC)在人类计算机互动和对话机器人技术领域中扮演着影响力,因为它可以激励机器提供同情心服务。近年来,多模式数据建模是一个新兴的研究领域,它受到人类融合多种感官的能力的启发。几种基于图的方法声称在模态之间捕获交互信息,但是多模式数据的异质性使这些方法禁止最佳解决方案。在这项工作中,我们介绍了一种名为Graph的多模式融合方法,并基于注意力的两阶段多源信息融合(GA2MIF)在对话中进行情感检测。我们提出的方法规定了将异质图作为模型输入的问题,同时消除了图形构造中的复杂冗余连接。 GA2MIF分别利用多头的有向图网络(MDGAT)和多头式成对跨模式注意网络(MPCAT)来关注上下文建模和跨模式建模。在两个公共数据集(即Iemocap和Meld)上进行了广泛的实验表明,所提出的GA2MIF具有有效捕获模式内的长距离上下文信息和模式间互补信息的能力,并且胜过了出色的范围。
Multimodal Emotion Recognition in Conversation (ERC) plays an influential role in the field of human-computer interaction and conversational robotics since it can motivate machines to provide empathetic services. Multimodal data modeling is an up-and-coming research area in recent years, which is inspired by human capability to integrate multiple senses. Several graph-based approaches claim to capture interactive information between modalities, but the heterogeneity of multimodal data makes these methods prohibit optimal solutions. In this work, we introduce a multimodal fusion approach named Graph and Attention based Two-stage Multi-source Information Fusion (GA2MIF) for emotion detection in conversation. Our proposed method circumvents the problem of taking heterogeneous graph as input to the model while eliminating complex redundant connections in the construction of graph. GA2MIF focuses on contextual modeling and cross-modal modeling through leveraging Multi-head Directed Graph ATtention networks (MDGATs) and Multi-head Pairwise Cross-modal ATtention networks (MPCATs), respectively. Extensive experiments on two public datasets (i.e., IEMOCAP and MELD) demonstrate that the proposed GA2MIF has the capacity to validly capture intra-modal long-range contextual information and inter-modal complementary information, as well as outperforms the prevalent State-Of-The-Art (SOTA) models by a remarkable margin.