论文标题
强大的RGB-D融合以进行显着检测
Robust RGB-D Fusion for Saliency Detection
论文作者
论文摘要
有效利用多模式输入以进行准确的RGB-D显着性检测是一个引起人们兴趣的话题。大多数现有作品都利用跨模式的交互来融合RGB-D的两个流以增强中间功能。在此过程中,尚未完全考虑可用深度质量低的实际方面。在这项工作中,我们的目标是RGB-D显着性检测,这对低质量的深度具有鲁棒性,这些深度主要出现在两种形式中:由于噪声而导致的不准确性和对RGB的错位。为此,我们提出了一种强大的RGB-D融合方法,该方法从(1)层方面受益,以及(2)三叉戟的空间,注意机制。一方面,层次的注意力(LWA)会根据深度精度来了解RGB和深度特征的早期和晚期融合之间的权衡。另一方面,三叉戟的空间注意力(TSA)汇总了从更广泛的空间上下文中的特征,以解决深度错位问题。所提出的LWA和TSA机制使我们能够有效利用多模式输入以进行显着性检测,同时对低质量的深度有稳定性。我们在五个基准数据集上的实验表明,所提出的融合方法的性能始终比最新的融合替代方案更好。
Efficiently exploiting multi-modal inputs for accurate RGB-D saliency detection is a topic of high interest. Most existing works leverage cross-modal interactions to fuse the two streams of RGB-D for intermediate features' enhancement. In this process, a practical aspect of the low quality of the available depths has not been fully considered yet. In this work, we aim for RGB-D saliency detection that is robust to the low-quality depths which primarily appear in two forms: inaccuracy due to noise and the misalignment to RGB. To this end, we propose a robust RGB-D fusion method that benefits from (1) layer-wise, and (2) trident spatial, attention mechanisms. On the one hand, layer-wise attention (LWA) learns the trade-off between early and late fusion of RGB and depth features, depending upon the depth accuracy. On the other hand, trident spatial attention (TSA) aggregates the features from a wider spatial context to address the depth misalignment problem. The proposed LWA and TSA mechanisms allow us to efficiently exploit the multi-modal inputs for saliency detection while being robust against low-quality depths. Our experiments on five benchmark datasets demonstrate that the proposed fusion method performs consistently better than the state-of-the-art fusion alternatives.