论文标题
文化网站中单阶段艺术品识别的无监督域适应方案
An Unsupervised Domain Adaptation Scheme for Single-Stage Artwork Recognition in Cultural Sites
论文作者
论文摘要
使用从用户的角度获取的图像(第一人称视野)在文化网站中识别艺术品,可以为访问者和网站经理构建有趣的应用程序。但是,需要在完全监督设置中使用的当前对象检测算法接受大量标记数据的培训,这些数据的收集需要大量时间和高成本才能实现良好的性能。使用从文化站点的3D模型生成的合成数据来训练算法可以降低这些成本。另一方面,当这些模型通过真实图像测试时,由于真实图像和合成图像之间的差异,观察到性能的显着下降。在这项研究中,我们考虑了在文化遗址中适应对象检测的无监督域的适应性问题。为了解决这个问题,我们创建了一个新的数据集,其中包含16个不同艺术品的合成图像和真实图像。因此,我们根据一阶段和两个阶段对象检测器,图像到图像翻译和特征对齐方式研究了不同的域适应技术。基于这样的观察,即单阶段检测器对所考虑的设置中的域移动更强大,我们提出了一种新方法,该方法以视网膜和特征对齐方式为基础,我们称为DA-Retinanet。所提出的方法比在拟议数据集和城市景观上的方法比较了结果。为了支持该领域的研究,我们在以下链接上发布数据集https://iplab.dmi.unict.it/ego-ch-bobj-uda/和https://github.com/fpv-iplab/da-retinanet的拟议体系结构的代码。
Recognizing artworks in a cultural site using images acquired from the user's point of view (First Person Vision) allows to build interesting applications for both the visitors and the site managers. However, current object detection algorithms working in fully supervised settings need to be trained with large quantities of labeled data, whose collection requires a lot of times and high costs in order to achieve good performance. Using synthetic data generated from the 3D model of the cultural site to train the algorithms can reduce these costs. On the other hand, when these models are tested with real images, a significant drop in performance is observed due to the differences between real and synthetic images. In this study we consider the problem of Unsupervised Domain Adaptation for object detection in cultural sites. To address this problem, we created a new dataset containing both synthetic and real images of 16 different artworks. We hence investigated different domain adaptation techniques based on one-stage and two-stage object detector, image-to-image translation and feature alignment. Based on the observation that single-stage detectors are more robust to the domain shift in the considered settings, we proposed a new method which builds on RetinaNet and feature alignment that we called DA-RetinaNet. The proposed approach achieves better results than compared methods on the proposed dataset and on Cityscapes. To support research in this field we release the dataset at the following link https://iplab.dmi.unict.it/EGO-CH-OBJ-UDA/ and the code of the proposed architecture at https://github.com/fpv-iplab/DA-RetinaNet.