统一流，立体声和深度估计

论文标题

统一流，立体声和深度估计

Unifying Flow, Stereo and Depth Estimation

论文作者

Xu, Haofei, Zhang, Jing, Cai, Jianfei, Rezatofighi, Hamid, Yu, Fisher, Tao, Dacheng, Geiger, Andreas

论文摘要

我们提出了三个运动和3D感知任务的统一公式和模型：光流，整流的立体声匹配和未经置的立体声深度估计。与以前针对每个特定任务的专业体系结构不同，我们将所有三个任务作为统一密集的对应关系匹配问题提出，可以通过直接比较特征相似性来通过单个模型来解决。这样的公式要求使用变压器，特别是交叉注意机制实现歧视性特征表示。我们证明，交叉注意力可以通过交叉视图相互作用从另一个图像中整合知识，从而大大提高了提取特征的质量。我们的统一模型自然可以实现交叉任务传输，因为模型体系结构和参数跨任务共享。在具有挑战性的Sintel数据集上，我们的统一模型均优胜于筏，我们的最终模型使用了一些其他特定于任务的精炼步骤的表现优于大量的改进步骤，或者与10种流行流，立体声和深度数据集的最新方法相比，同时更简单，并且以模型设计和优势速度更有效。

We present a unified formulation and model for three motion and 3D perception tasks: optical flow, rectified stereo matching and unrectified stereo depth estimation from posed images. Unlike previous specialized architectures for each specific task, we formulate all three tasks as a unified dense correspondence matching problem, which can be solved with a single model by directly comparing feature similarities. Such a formulation calls for discriminative feature representations, which we achieve using a Transformer, in particular the cross-attention mechanism. We demonstrate that cross-attention enables integration of knowledge from another image via cross-view interactions, which greatly improves the quality of the extracted features. Our unified model naturally enables cross-task transfer since the model architecture and parameters are shared across tasks. We outperform RAFT with our unified model on the challenging Sintel dataset, and our final model that uses a few additional task-specific refinement steps outperforms or compares favorably to recent state-of-the-art methods on 10 popular flow, stereo and depth datasets, while being simpler and more efficient in terms of model design and inference speed.

下载PDF全文

下载文献需遵守相关版权规定

论文标题