对象图和关系图的联合学习用于视觉问题回答

论文标题

对象图和关系图的联合学习用于视觉问题回答

Joint learning of object graph and relation graph for visual question answering

论文作者

Li, Hao, Li, Xu, Karimi, Belhal, Chen, Jie, Sun, Mingming

论文摘要

通过场景图对视觉问题的回答（VQA）进行建模可以显着提高推理的准确性和解释性。但是，现有模型对于具有属性或关系的复杂推理问题的回答很差，这会导致图1（a）中的错误属性选择或缺失关系。这是因为这些模型无法在场景图中平衡各种信息，而是忽略了关系和属性信息。在本文中，我们介绍了一个新颖的双消息填充增强的图形神经网络（DM-gnn），该网络可以通过正确编码多尺度场景图信息来获得平衡的表示。具体而言，我们（i）将场景图转换为两个图表，并将其多样化地关注对象和关系。然后，我们设计了一个双重结构来编码它们，从而增加了关系（ii）与属性特征融合的权重（ii），从而增加了属性的权重；（iii）提出了一种消息，以增强对象，关系和属性之间的信息传输。我们在包括GQA，VG，Motif-VG在内的数据集上进行了广泛的实验，并实现了新的最新技术。

Modeling visual question answering(VQA) through scene graphs can significantly improve the reasoning accuracy and interpretability. However, existing models answer poorly for complex reasoning questions with attributes or relations, which causes false attribute selection or missing relation in Figure 1(a). It is because these models cannot balance all kinds of information in scene graphs, neglecting relation and attribute information. In this paper, we introduce a novel Dual Message-passing enhanced Graph Neural Network (DM-GNN), which can obtain a balanced representation by properly encoding multi-scale scene graph information. Specifically, we (i)transform the scene graph into two graphs with diversified focuses on objects and relations; Then we design a dual structure to encode them, which increases the weights from relations (ii)fuse the encoder output with attribute features, which increases the weights from attributes; (iii)propose a message-passing mechanism to enhance the information transfer between objects, relations and attributes. We conduct extensive experiments on datasets including GQA, VG, motif-VG and achieve new state of the art.

下载PDF全文

下载文献需遵守相关版权规定

论文标题