论文标题
多模式分析的深度多层次细心网络
A Deep Multi-Level Attentive network for Multimodal Sentiment Analysis
论文作者
论文摘要
多模式情感分析吸引了广泛的应用前景引起的关注。现有方法的重点是单一模式,该模式未能捕获多种方式的社交媒体内容。此外,在多模式学习中,大多数作品都集中在简单地结合两种方式上,而无需探索它们之间的复杂相关性。这导致对多模式情感分类的表现不满意。在现状的激励下,我们提出了一个深层的多层次细心网络,该网络利用图像和文本方式之间的相关性以改善多模式学习。具体而言,我们沿空间和通道尺寸生成双重视觉图,以放大CNNS表示功率。然后,我们通过应用语义注意力来提取与双重练习视觉特征相关的文本特征来对图像区域和单词语义之间的相关性进行建模。最后,采用自我注意力自动获取分类富含情感的多模式特征。我们对四个现实世界数据集进行了广泛的评估,即MVSA单,MVSA-MULTIPE,FLICKR和GETTY IMAGES,这些图像验证了我们方法的优势。
Multimodal sentiment analysis has attracted increasing attention with broad application prospects. The existing methods focuses on single modality, which fails to capture the social media content for multiple modalities. Moreover, in multi-modal learning, most of the works have focused on simply combining the two modalities, without exploring the complicated correlations between them. This resulted in dissatisfying performance for multimodal sentiment classification. Motivated by the status quo, we propose a Deep Multi-Level Attentive network, which exploits the correlation between image and text modalities to improve multimodal learning. Specifically, we generate the bi-attentive visual map along the spatial and channel dimensions to magnify CNNs representation power. Then we model the correlation between the image regions and semantics of the word by extracting the textual features related to the bi-attentive visual features by applying semantic attention. Finally, self-attention is employed to automatically fetch the sentiment-rich multimodal features for the classification. We conduct extensive evaluations on four real-world datasets, namely, MVSA-Single, MVSA-Multiple, Flickr, and Getty Images, which verifies the superiority of our method.