论文标题
使用外部百科全书知识生成图像标题
Generating image captions with external encyclopedic knowledge
论文作者
论文摘要
准确地报告图像中描绘了哪些对象是自动字幕生成中的一个解决问题。真正类似人类字幕的途径的下一个大挑战是能够纳入图像和相关的现实世界知识的背景。我们通过创建一个端到端字幕生成系统来应对这一挑战,该系统可以广泛使用特定于图像的百科全书数据。我们的方法包括一种使用图像位置来识别外部知识库中相关开放域事实的新方法,随后在编码和解码阶段将其随后集成到字幕管道中。我们的系统在具有自然产生的知识文字的新数据集上进行了训练和测试,并且对多个基线进行了重大改进。我们从经验上证明,我们的方法对于用百科全书知识生成上下文化字幕有效,这既是与图像的准确又相关的。
Accurately reporting what objects are depicted in an image is largely a solved problem in automatic caption generation. The next big challenge on the way to truly humanlike captioning is being able to incorporate the context of the image and related real world knowledge. We tackle this challenge by creating an end-to-end caption generation system that makes extensive use of image-specific encyclopedic data. Our approach includes a novel way of using image location to identify relevant open-domain facts in an external knowledge base, with their subsequent integration into the captioning pipeline at both the encoding and decoding stages. Our system is trained and tested on a new dataset with naturally produced knowledge-rich captions, and achieves significant improvements over multiple baselines. We empirically demonstrate that our approach is effective for generating contextualized captions with encyclopedic knowledge that is both factually accurate and relevant to the image.