AspanFormer：与自适应跨度变压器匹配的无检测图像

论文标题

AspanFormer：与自适应跨度变压器匹配的无检测图像

ASpanFormer: Detector-Free Image Matching with Adaptive Span Transformer

论文作者

Chen, Hongkai, Luo, Zixin, Zhou, Lei, Tian, Yurun, Zhen, Mingmin, Fang, Tian, Mckinnon, David, Tsin, Yanghai, Quan, Long

论文摘要

在图像之间生成坚固且可靠的对应关系是多种应用程序的基本任务。为了在全球和局部粒度上捕获上下文，我们提出了Aspanformer，这是一种基于变压器的无探测器匹配器，建立在层次的注意力结构上，采用了一种新颖的注意力操作，能够以自适应方式调整注意力跨度。为了实现这一目标，首先，在每个跨注意阶段都会回归流图，以找到搜索区域的中心。接下来，在中心周围生成一个采样网格，其大小不是根据固定的经验配置为固定的，而是根据与流图一起估计的像素不确定性的自适应计算。最后，在派生区域内的两个图像上计算注意力，称为注意跨度。通过这些方式，我们不仅能够维持长期依赖性，而且能够在高相关性的像素之间进行细粒度的注意，从而补偿基本位置和匹配任务中的零件平滑度。在广泛的评估基准上的最先进的准确性验证了我们方法的强匹配能力。

Generating robust and reliable correspondences across images is a fundamental task for a diversity of applications. To capture context at both global and local granularity, we propose ASpanFormer, a Transformer-based detector-free matcher that is built on hierarchical attention structure, adopting a novel attention operation which is capable of adjusting attention span in a self-adaptive manner. To achieve this goal, first, flow maps are regressed in each cross attention phase to locate the center of search region. Next, a sampling grid is generated around the center, whose size, instead of being empirically configured as fixed, is adaptively computed from a pixel uncertainty estimated along with the flow map. Finally, attention is computed across two images within derived regions, referred to as attention span. By these means, we are able to not only maintain long-range dependencies, but also enable fine-grained attention among pixels of high relevance that compensates essential locality and piece-wise smoothness in matching tasks. State-of-the-art accuracy on a wide range of evaluation benchmarks validates the strong matching capability of our method.

下载PDF全文

下载文献需遵守相关版权规定

论文标题