论文标题
MTAG:非对齐人类多模式序列的模态 - 暂时性注意图
MTAG: Modal-Temporal Attention Graph for Unaligned Human Multimodal Language Sequences
论文作者
论文摘要
人类交流本质上是多模式的;通过多种方式(例如语言,声音和面部表情)表达了观点和情感。该域中的数据表现出复杂的多关系和时间相互作用。从这些数据中学习是一个根本上具有挑战性的研究问题。在本文中,我们提出了模态 - 周期性注意力图(MTAG)。 MTAG是一种基于图形的神经模型,它提供了一个合适的框架,用于分析多模式顺序数据。我们首先引入了一个程序,将未对齐的多模式序列数据转换为具有异质节点和边缘的图形,该图可以捕获跨模态和随时间的富集相互作用。然后,一种新型的图形融合操作,称为MTAG融合,以及动态的修剪和读出技术,旨在有效地处理此模态 - 周期性图并捕获各种相互作用。通过学习仅关注图中的重要相互作用,MTAG在多模式情感分析和情感识别基准上实现了最新的表现,同时利用了较少的模型参数。
Human communication is multimodal in nature; it is through multiple modalities such as language, voice, and facial expressions, that opinions and emotions are expressed. Data in this domain exhibits complex multi-relational and temporal interactions. Learning from this data is a fundamentally challenging research problem. In this paper, we propose Modal-Temporal Attention Graph (MTAG). MTAG is an interpretable graph-based neural model that provides a suitable framework for analyzing multimodal sequential data. We first introduce a procedure to convert unaligned multimodal sequence data into a graph with heterogeneous nodes and edges that captures the rich interactions across modalities and through time. Then, a novel graph fusion operation, called MTAG fusion, along with a dynamic pruning and read-out technique, is designed to efficiently process this modal-temporal graph and capture various interactions. By learning to focus only on the important interactions within the graph, MTAG achieves state-of-the-art performance on multimodal sentiment analysis and emotion recognition benchmarks, while utilizing significantly fewer model parameters.