可扩展的自我监督表示从时空运动轨迹学习多模式计算机视觉

论文标题

可扩展的自我监督表示从时空运动轨迹学习多模式计算机视觉

Scalable Self-Supervised Representation Learning from Spatiotemporal Motion Trajectories for Multimodal Computer Vision

论文作者

Ganguli, Swetava, Iyer, C. V. Krishnakumar, Pandey, Vipul

论文摘要

自我监督的表示技术技术利用大型数据集，而没有语义注释来学习有意义的通用功能，这些功能可以方便地转移以解决各种下游监督任务。在这项工作中，我们提出了一种自我监督的方法，用于学习从未标记的GPS轨迹的地理位置来求解下游地理空间视觉任务任务。由地球表面的栅格表示产生的瓷砖被建模为图像的图或像素上的节点。将GPS轨迹建模为允许在这些节点上的Markovian路径。提出了一种可扩展和分布式算法，以计算图像摘要（称为可及性摘要）的空间连通性模式与观察到的马尔可道路径所隐含的邻居之间的空间连通性模式。培训了一个卷积，收缩自动编码器，以学习每个瓷砖的可及性摘要的压缩表示形式，称为可及性嵌入。可及性嵌入用作任务无关的，地理位置的特征表示。使用可及性嵌入作为五个不同下游地理空间任务的像素表示，作为监督的语义分割问题，我们定量地表明，可及性嵌入性能是有意义的表示，性能的增益为4-23％，并导致在与基准相比，在与基准相比的pixel docel do do pixel do pixel do pix cix ectialse vice seality semalive in Cixsele corneveriation smeartive not cix eclecl cerve eclecl curve（auprc）的均值瓷砖。可及性嵌入将顺序的时空迁移率数据转化为语义上有意义的张量表示，可以将其与其他图像来源结合使用，旨在促进地理空间计算机视觉中的多模式学习。

Self-supervised representation learning techniques utilize large datasets without semantic annotations to learn meaningful, universal features that can be conveniently transferred to solve a wide variety of downstream supervised tasks. In this work, we propose a self-supervised method for learning representations of geographic locations from unlabeled GPS trajectories to solve downstream geospatial computer vision tasks. Tiles resulting from a raster representation of the earth's surface are modeled as nodes on a graph or pixels of an image. GPS trajectories are modeled as allowed Markovian paths on these nodes. A scalable and distributed algorithm is presented to compute image-like representations, called reachability summaries, of the spatial connectivity patterns between tiles and their neighbors implied by the observed Markovian paths. A convolutional, contractive autoencoder is trained to learn compressed representations, called reachability embeddings, of reachability summaries for every tile. Reachability embeddings serve as task-agnostic, feature representations of geographic locations. Using reachability embeddings as pixel representations for five different downstream geospatial tasks, cast as supervised semantic segmentation problems, we quantitatively demonstrate that reachability embeddings are semantically meaningful representations and result in 4-23% gain in performance, as measured using area under the precision-recall curve (AUPRC) metric, when compared to baseline models that use pixel representations that do not account for the spatial connectivity between tiles. Reachability embeddings transform sequential, spatiotemporal mobility data into semantically meaningful tensor representations that can be combined with other sources of imagery and are designed to facilitate multimodal learning in geospatial computer vision.

下载PDF全文

下载文献需遵守相关版权规定

论文标题