通过自我划分协助场景图生成

论文标题

通过自我划分协助场景图生成

Assisting Scene Graph Generation with Self-Supervision

论文作者

Inuganti, Sandeep, Balasubramanian, Vineeth N

论文摘要

在现场图中的研究生成在过去几年中很快获得了吸引力，因为它有可能帮助进行下游任务，例如视觉问题回答，图像字幕等。已经提出了许多有趣的方法来解决这个问题。这些作品中的大多数具有预训练的对象检测模型，作为初步特征提取器。因此，从对象检测模型中获取对象边界框建议相对便宜。我们利用了预先训练的检测器产生的边界框注释的这种现成供应。我们提出了一组三个新颖但简单的自学任务，并将它们作为辅助多任务训练，以达到主要模型。在比较时，我们通过这些自我实施任务从头开始训练基本模型，我们实现了所有指标和召回设置的最新结果。我们还通过训练模型的拟议自学损失来解决两种类型的关系之间的一些混乱：几何和占有欲。我们使用基准数据集，视觉基因组来进行我们的实验并显示我们的结果。

Research in scene graph generation has quickly gained traction in the past few years because of its potential to help in downstream tasks like visual question answering, image captioning, etc. Many interesting approaches have been proposed to tackle this problem. Most of these works have a pre-trained object detection model as a preliminary feature extractor. Therefore, getting object bounding box proposals from the object detection model is relatively cheaper. We take advantage of this ready availability of bounding box annotations produced by the pre-trained detector. We propose a set of three novel yet simple self-supervision tasks and train them as auxiliary multi-tasks to the main model. While comparing, we train the base-model from scratch with these self-supervision tasks, we achieve state-of-the-art results in all the metrics and recall settings. We also resolve some of the confusion between two types of relationships: geometric and possessive, by training the model with the proposed self-supervision losses. We use the benchmark dataset, Visual Genome to conduct our experiments and show our results.

下载PDF全文

下载文献需遵守相关版权规定

论文标题