Lako：知识驱动的视觉问题通过后期的知识向文本注入回答

论文标题

Lako：知识驱动的视觉问题通过后期的知识向文本注入回答

LaKo: Knowledge-driven Visual Question Answering via Late Knowledge-to-Text Injection

论文作者

Chen, Zhuo, Huang, Yufeng, Chen, Jiaoyan, Geng, Yuxia, Fang, Yin, Pan, Jeff, Zhang, Ningyu, Zhang, Wen

论文摘要

视觉问题回答（VQA）通常需要对视觉概念和语言语义的理解，这取决于外部知识。大多数现有方法利用了预训练的语言模型或/和非结构化文本，但是这些资源中的知识通常不完整且嘈杂。其他一些方法更喜欢使用经常具有强化结构知识的知识图（kgs），但是研究仍然是相当初步的。在本文中，我们提出了Lako，这是一种知识驱动的VQA方法，通过后期的文本注射。为了有效地纳入外部KG，我们将三元化三元转移到文本格式中，并提出了一种较晚的注射机制以进行知识融合。最后，我们将VQA作为文本生成任务，并具有有效的编码器范式，该任务在OKVQA数据集上实现了最新的结果。

Visual question answering (VQA) often requires an understanding of visual concepts and language semantics, which relies on external knowledge. Most existing methods exploit pre-trained language models or/and unstructured text, but the knowledge in these resources are often incomplete and noisy. Some other methods prefer to use knowledge graphs (KGs) which often have intensive structured knowledge, but the research is still quite preliminary. In this paper, we propose LaKo, a knowledge-driven VQA method via Late Knowledge-to-text Injection. To effectively incorporate an external KG, we transfer triples into textual format and propose a late injection mechanism for knowledge fusion. Finally we address VQA as a text generation task with an effective encoder-decoder paradigm, which achieves state-of-the-art results on OKVQA dataset.

下载PDF全文

下载文献需遵守相关版权规定

论文标题