具有多模式知识库的多模式实体标记

论文标题

具有多模式知识库的多模式实体标记

Multimodal Entity Tagging with Multimodal Knowledge Base

论文作者

Peng, Hao, Li, Hang, Hou, Lei, Li, Juanzi, Qiao, Chao

论文摘要

为了增强对多模式知识库和多模式信息处理的研究，我们提出了一个具有多模式知识库（MKB）的称为多模式实体标签（MET）的新任务。我们还使用现有的MKB开发了一个数据集来解决问题。在MKB中，有实体及其相关的文本和图像。在MET中，给定文本图像对，一个人使用MKB中的信息自动识别文本图像对中的相关实体。我们通过使用信息检索范式来解决任务，并使用NLP和CV中的最新方法实现多个基线。我们进行了广泛的实验，并对实验结果进行了分析。结果表明，任务具有挑战性，但是当前的技术可以实现相对较高的性能。我们将发布数据集，代码和模型，以备将来研究。

To enhance research on multimodal knowledge base and multimodal information processing, we propose a new task called multimodal entity tagging (MET) with a multimodal knowledge base (MKB). We also develop a dataset for the problem using an existing MKB. In an MKB, there are entities and their associated texts and images. In MET, given a text-image pair, one uses the information in the MKB to automatically identify the related entity in the text-image pair. We solve the task by using the information retrieval paradigm and implement several baselines using state-of-the-art methods in NLP and CV. We conduct extensive experiments and make analyses on the experimental results. The results show that the task is challenging, but current technologies can achieve relatively high performance. We will release the dataset, code, and models for future research.

下载PDF全文

下载文献需遵守相关版权规定

论文标题