多模式知识发现和预训练的统一连续学习框架

论文标题

多模式知识发现和预训练的统一连续学习框架

A Unified Continuous Learning Framework for Multi-modal Knowledge Discovery and Pre-training

论文作者

Fan, Zhihao, Wei, Zhongyu, Chen, Jingjing, Wang, Siyuan, Li, Zejun, Xu, Jiarong, Huang, Xuanjing

论文摘要

多模式的预训练和知识发现是多模式机器学习中的两个重要研究主题。然而，没有现有的作品试图将知识发现与知识指导的多模式预训练联系起来。在本文中，我们建议将它们统一成一个连续的学习框架，以进行相互改进。以图像和文本的开放域单模式数据集作为输入，我们将知识图作为支持这两个任务的基础。对于知识发现，使用预训练的模型来识别图表上的跨模式链接。对于模型预训练，将知识图用作指导模型更新的外部知识。这两个步骤是在我们的持续学习框架中迭代执行的。关于知识发现和预训练模型，MS-Coco和FlickR30K的实验结果验证了我们框架的有效性。

Multi-modal pre-training and knowledge discovery are two important research topics in multi-modal machine learning. Nevertheless, none of existing works make attempts to link knowledge discovery with knowledge guided multi-modal pre-training. In this paper, we propose to unify them into a continuous learning framework for mutual improvement. Taking the open-domain uni-modal datasets of images and texts as input, we maintain a knowledge graph as the foundation to support these two tasks. For knowledge discovery, a pre-trained model is used to identify cross-modal links on the graph. For model pre-training, the knowledge graph is used as the external knowledge to guide the model updating. These two steps are iteratively performed in our framework for continuous learning. The experimental results on MS-COCO and Flickr30K with respect to both knowledge discovery and the pre-trained model validate the effectiveness of our framework.

下载PDF全文

下载文献需遵守相关版权规定

论文标题