重新思考大型应用的视觉地理位置定位

论文标题

重新思考大型应用的视觉地理位置定位

Rethinking Visual Geo-localization for Large-Scale Applications

论文作者

Berton, Gabriele, Masone, Carlo, Caputo, Barbara

论文摘要

视觉地理位置定位（VG）是估计通过将其与已知位置图像的大数据库进行比较来估算给定照片的位置的任务。为了调查现有技术在现实世界中的VG应用程序上的性能，我们构建了旧金山超大型，这是一个涵盖整个城市的新数据集，并提供了广泛的挑战性案例，其大小的30倍大于以前的最大数据集，以进行视觉地理位置定位。我们发现当前的方法无法扩展到如此大的数据集，因此我们设计了一种称为Cosplace的新型高度可扩展的培训技术，该技术将培训视为分类问题，避免了常用的对比度学习所需的昂贵采矿。我们在广泛的数据集上实现最先进的性能，并发现Cosplace对重型领域的变化很强。此外，我们表明，与以前的最新面积相比，Cosplace需要在火车时减少GPU记忆的80％，并且使用8倍较小的描述符获得了更好的结果，为全市范围的现实世界视觉地理位置定位铺平了道路。数据集，代码和受过训练的模型可用于研究目的，网址为https://github.com/gmberton/cosplace。

Visual Geo-localization (VG) is the task of estimating the position where a given photo was taken by comparing it with a large database of images of known locations. To investigate how existing techniques would perform on a real-world city-wide VG application, we build San Francisco eXtra Large, a new dataset covering a whole city and providing a wide range of challenging cases, with a size 30x bigger than the previous largest dataset for visual geo-localization. We find that current methods fail to scale to such large datasets, therefore we design a new highly scalable training technique, called CosPlace, which casts the training as a classification problem avoiding the expensive mining needed by the commonly used contrastive learning. We achieve state-of-the-art performance on a wide range of datasets and find that CosPlace is robust to heavy domain changes. Moreover, we show that, compared to the previous state-of-the-art, CosPlace requires roughly 80% less GPU memory at train time, and it achieves better results with 8x smaller descriptors, paving the way for city-wide real-world visual geo-localization. Dataset, code and trained models are available for research purposes at https://github.com/gmberton/CosPlace.

下载PDF全文

下载文献需遵守相关版权规定

论文标题