论文标题
EDNET:具有成本量组合和基于注意的空间残差的有效差异估计
EDNet: Efficient Disparity Estimation with Cost Volume Combination and Attention-based Spatial Residual
论文作者
论文摘要
现有的最新差异估计主要利用4D串联量,并构建一个非常深的3D卷积神经网络(CNN)进行差异回归,由于高内存消耗和推理速度较慢,这是低效的。在本文中,我们提出了一个名为EDNET的网络,以进行有效的差异估计。首先,我们构建了一个组合的卷,该体积结合了挤压串联体积中的上下文信息,并具有相关量的相似性测量。合并的卷可以接下来是由2D卷积汇总的,这些卷积速度更快,并且需要比3D卷积少的内存。其次,我们提出了一个基于注意力的空间残差模块,以产生注意力感知的残留特征。注意机制用于提供有关不准确区域的直观空间证据,该区域在多个尺度上的误差地图,从而提高了剩余的学习效率。场景流和Kitti数据集的大量实验表明,EDNET的表现优于前3D CNN的工作,并以明显更快的速度和更少的内存消耗来实现最先进的性能。
Existing state-of-the-art disparity estimation works mostly leverage the 4D concatenation volume and construct a very deep 3D convolution neural network (CNN) for disparity regression, which is inefficient due to the high memory consumption and slow inference speed. In this paper, we propose a network named EDNet for efficient disparity estimation. Firstly, we construct a combined volume which incorporates contextual information from the squeezed concatenation volume and feature similarity measurement from the correlation volume. The combined volume can be next aggregated by 2D convolutions which are faster and require less memory than 3D convolutions. Secondly, we propose an attention-based spatial residual module to generate attention-aware residual features. The attention mechanism is applied to provide intuitive spatial evidence about inaccurate regions with the help of error maps at multiple scales and thus improve the residual learning efficiency. Extensive experiments on the Scene Flow and KITTI datasets show that EDNet outperforms the previous 3D CNN based works and achieves state-of-the-art performance with significantly faster speed and less memory consumption.