MTAG：非对齐人类多模式序列的模态 - 暂时性注意图

论文标题

MTAG：非对齐人类多模式序列的模态 - 暂时性注意图

MTAG: Modal-Temporal Attention Graph for Unaligned Human Multimodal Language Sequences

论文作者

Yang, Jianing, Wang, Yongxin, Yi, Ruitao, Zhu, Yuying, Rehman, Azaan, Zadeh, Amir, Poria, Soujanya, Morency, Louis-Philippe

论文摘要

人类交流本质上是多模式的；通过多种方式（例如语言，声音和面部表情）表达了观点和情感。该域中的数据表现出复杂的多关系和时间相互作用。从这些数据中学习是一个根本上具有挑战性的研究问题。在本文中，我们提出了模态 - 周期性注意力图（MTAG）。 MTAG是一种基于图形的神经模型，它提供了一个合适的框架，用于分析多模式顺序数据。我们首先引入了一个程序，将未对齐的多模式序列数据转换为具有异质节点和边缘的图形，该图可以捕获跨模态和随时间的富集相互作用。然后，一种新型的图形融合操作，称为MTAG融合，以及动态的修剪和读出技术，旨在有效地处理此模态 - 周期性图并捕获各种相互作用。通过学习仅关注图中的重要相互作用，MTAG在多模式情感分析和情感识别基准上实现了最新的表现，同时利用了较少的模型参数。

Human communication is multimodal in nature; it is through multiple modalities such as language, voice, and facial expressions, that opinions and emotions are expressed. Data in this domain exhibits complex multi-relational and temporal interactions. Learning from this data is a fundamentally challenging research problem. In this paper, we propose Modal-Temporal Attention Graph (MTAG). MTAG is an interpretable graph-based neural model that provides a suitable framework for analyzing multimodal sequential data. We first introduce a procedure to convert unaligned multimodal sequence data into a graph with heterogeneous nodes and edges that captures the rich interactions across modalities and through time. Then, a novel graph fusion operation, called MTAG fusion, along with a dynamic pruning and read-out technique, is designed to efficiently process this modal-temporal graph and capture various interactions. By learning to focus only on the important interactions within the graph, MTAG achieves state-of-the-art performance on multimodal sentiment analysis and emotion recognition benchmarks, while utilizing significantly fewer model parameters.

下载PDF全文

下载文献需遵守相关版权规定

论文标题