强大的RGB-D融合以进行显着检测

论文标题

强大的RGB-D融合以进行显着检测

Robust RGB-D Fusion for Saliency Detection

论文作者

Wu, Zongwei, Gobichettipalayam, Shriarulmozhivarman, Tamadazte, Brahim, Allibert, Guillaume, Paudel, Danda Pani, Demonceaux, Cédric

论文摘要

有效利用多模式输入以进行准确的RGB-D显着性检测是一个引起人们兴趣的话题。大多数现有作品都利用跨模式的交互来融合RGB-D的两个流以增强中间功能。在此过程中，尚未完全考虑可用深度质量低的实际方面。在这项工作中，我们的目标是RGB-D显着性检测，这对低质量的深度具有鲁棒性，这些深度主要出现在两种形式中：由于噪声而导致的不准确性和对RGB的错位。为此，我们提出了一种强大的RGB-D融合方法，该方法从（1）层方面受益，以及（2）三叉戟的空间，注意机制。一方面，层次的注意力（LWA）会根据深度精度来了解RGB和深度特征的早期和晚期融合之间的权衡。另一方面，三叉戟的空间注意力（TSA）汇总了从更广泛的空间上下文中的特征，以解决深度错位问题。所提出的LWA和TSA机制使我们能够有效利用多模式输入以进行显着性检测，同时对低质量的深度有稳定性。我们在五个基准数据集上的实验表明，所提出的融合方法的性能始终比最新的融合替代方案更好。

Efficiently exploiting multi-modal inputs for accurate RGB-D saliency detection is a topic of high interest. Most existing works leverage cross-modal interactions to fuse the two streams of RGB-D for intermediate features' enhancement. In this process, a practical aspect of the low quality of the available depths has not been fully considered yet. In this work, we aim for RGB-D saliency detection that is robust to the low-quality depths which primarily appear in two forms: inaccuracy due to noise and the misalignment to RGB. To this end, we propose a robust RGB-D fusion method that benefits from (1) layer-wise, and (2) trident spatial, attention mechanisms. On the one hand, layer-wise attention (LWA) learns the trade-off between early and late fusion of RGB and depth features, depending upon the depth accuracy. On the other hand, trident spatial attention (TSA) aggregates the features from a wider spatial context to address the depth misalignment problem. The proposed LWA and TSA mechanisms allow us to efficiently exploit the multi-modal inputs for saliency detection while being robust against low-quality depths. Our experiments on five benchmark datasets demonstrate that the proposed fusion method performs consistently better than the state-of-the-art fusion alternatives.

下载PDF全文

下载文献需遵守相关版权规定

论文标题