通过合作蒸馏来减轻图像字幕中的嘈杂数据

论文标题

通过合作蒸馏来减轻图像字幕中的嘈杂数据

Alleviating Noisy Data in Image Captioning with Cooperative Distillation

论文作者

Dognin, Pierre, Melnyk, Igor, Mroueh, Youssef, Padhi, Inkit, Rigotti, Mattia, Ross, Jarret, Schiff, Yair

论文摘要

图像字幕系统已取得了很大的进步，这主要是由于Microsoft Coco或Vizwiz等精选数据集的可用性，它们对其相应的图像具有准确的描述。不幸的是，这种干净标记的数据的稀缺性导致训练有素的算法产生字幕，这些字幕可能是对图像中细节的特殊性和特殊性的特殊性。我们提出了一种新技术，合作的蒸馏，将干净的策划数据集与网络尺度结合了Google概念标题数据集（GCC）的自动提取的字幕（GCC），该字幕的图像描述较差，但尺寸丰富，因此提供了更丰富的词汇，从而产生了更多表现力的字幕。

Image captioning systems have made substantial progress, largely due to the availability of curated datasets like Microsoft COCO or Vizwiz that have accurate descriptions of their corresponding images. Unfortunately, scarce availability of such cleanly labeled data results in trained algorithms producing captions that can be terse and idiosyncratically specific to details in the image. We propose a new technique, cooperative distillation that combines clean curated datasets with the web-scale automatically extracted captions of the Google Conceptual Captions dataset (GCC), which can have poor descriptions of images, but is abundant in size and therefore provides a rich vocabulary resulting in more expressive captions.

下载PDF全文

下载文献需遵守相关版权规定

论文标题