论文标题

这个图像在哪里?野外基于变压器的地理定位

Where in the World is this Image? Transformer-based Geo-localization in the Wild

论文作者

Pramanick, Shraman, Nowara, Ewa M., Gleason, Joshua, Castillo, Carlos D., Chellappa, Rama

论文摘要

从世界任何地方拍摄的单个地面RGB图像中预测地理位置(地理位置)是一个非常具有挑战性的问题。挑战包括由于不同的环境场景而导致的图像多样性,相同位置的出现急剧变化,具体取决于一天中的时间,天气,季节和更重要的是,预测是由单个图像可能只有几个地理位置的提示进行的。由于这些原因,大多数现有作品仅限于特定的城市,图像或全球地标。在这项工作中,我们专注于为行星尺度单位地理区域化开发有效的解决方案。为此,我们提出了转运器,这是一个统一的双支化型变压器网络,在整个图像上关注细节,并在极端的外观变化下产生强大的特征表示。转运器将RGB图像及其语义分割图作为输入,在每个变压器层之后的两个平行分支之间进行交互,并以多任务的方式同时执行地理位置定位和场景识别。我们在四个基准数据集上评估转运器-IM2GPS,IM2GPS3K,YFCC4K,YFCC26K,并获得5.5%,14.1%,4.9%,9.9%的大陆级别准确度比最新的ART。在现实世界测试图像上还验证了转运剂,发现比以前的方法更有效。

Predicting the geographic location (geo-localization) from a single ground-level RGB image taken anywhere in the world is a very challenging problem. The challenges include huge diversity of images due to different environmental scenarios, drastic variation in the appearance of the same location depending on the time of the day, weather, season, and more importantly, the prediction is made from a single image possibly having only a few geo-locating cues. For these reasons, most existing works are restricted to specific cities, imagery, or worldwide landmarks. In this work, we focus on developing an efficient solution to planet-scale single-image geo-localization. To this end, we propose TransLocator, a unified dual-branch transformer network that attends to tiny details over the entire image and produces robust feature representation under extreme appearance variations. TransLocator takes an RGB image and its semantic segmentation map as inputs, interacts between its two parallel branches after each transformer layer, and simultaneously performs geo-localization and scene recognition in a multi-task fashion. We evaluate TransLocator on four benchmark datasets - Im2GPS, Im2GPS3k, YFCC4k, YFCC26k and obtain 5.5%, 14.1%, 4.9%, 9.9% continent-level accuracy improvement over the state-of-the-art. TransLocator is also validated on real-world test images and found to be more effective than previous methods.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源