论文标题
自我改革器:带有变压器的自改制网络,用于显着对象检测
SelfReformer: Self-Refined Network with Transformer for Salient Object Detection
论文作者
论文摘要
全球和本地环境显着有助于显着对象检测(SOD)中预测的完整性。不幸的是,现有的方法仍然很难为细节提供完整的预测。传统方法中有两个主要问题:首先,对于全球环境,高级基于CNN的编码器功能无法有效地捕获长期依赖性,从而导致预测不完整。其次,将地面真相降低以适应预测的规模,将引起不准确性,因为在插值或合并过程中丢失了地面真相细节。因此,在这项工作中,我们开发了一个基于变压器的网络,并构成了分支机构明确学习全球上下文信息的监督任务。此外,我们采用从超级分辨率(SR)的像素随机散发,将预测重塑为地面真理的大小,而不是反向。因此,地面真理中的细节没有触及。此外,我们开发了两个阶段上下文改进模块(CRM)以融合全局上下文,并自动在预测中找到和完善本地细节。拟议的网络可以根据生成的全局和本地上下文来指导和纠正自身,因此被命名为自改制的变压器(自改性器)。五个基准数据集的广泛实验和评估结果证明了该网络的出色性能,我们实现了最新的技术。
The global and local contexts significantly contribute to the integrity of predictions in Salient Object Detection (SOD). Unfortunately, existing methods still struggle to generate complete predictions with fine details. There are two major problems in conventional approaches: first, for global context, high-level CNN-based encoder features cannot effectively catch long-range dependencies, resulting in incomplete predictions. Second, downsampling the ground truth to fit the size of predictions will introduce inaccuracy as the ground truth details are lost during interpolation or pooling. Thus, in this work, we developed a Transformer-based network and framed a supervised task for a branch to learn the global context information explicitly. Besides, we adopt Pixel Shuffle from Super-Resolution (SR) to reshape the predictions back to the size of ground truth instead of the reverse. Thus details in the ground truth are untouched. In addition, we developed a two-stage Context Refinement Module (CRM) to fuse global context and automatically locate and refine the local details in the predictions. The proposed network can guide and correct itself based on the global and local context generated, thus is named, Self-Refined Transformer (SelfReformer). Extensive experiments and evaluation results on five benchmark datasets demonstrate the outstanding performance of the network, and we achieved the state-of-the-art.