论文标题
PAANET:基于视觉感知的四个阶段框架,用于使用高阶对比度操作员的显着对象检测
PAANet:Visual Perception based Four-stage Framework for Salient Object Detection using High-order Contrast Operator
论文作者
论文摘要
人们认为,在执行显着对象检测(SOD)时,人类视觉系统(HVS)由守支前的过程和注意力过程组成。 Based on this fact, we propose a four-stage framework for SOD, in which the first two stages match the \textbf{P}re-\textbf{A}ttentive process consisting of general feature extraction (GFE) and feature preprocessing (FP), and the last two stages are corresponding to \textbf{A}ttention process containing saliency feature extraction (SFE) and the feature aggregation (fa),即\ textbf {paanet}。根据竞争前的过程,GFE阶段应用了训练有素的主链,并且不需要进一步的填充不同数据集。这种修改可以大大提高训练速度。 FP阶段扮演着FINETUNT的作用,但由于其更简单的结构和更少的参数,因此更有效地起作用。此外,在SFE阶段,我们设计出显着性特征提取的新型对比操作员,它在语义上起作用,与传统卷积操作员在前景及其周围环境之间提取交互式信息时,它的作用更大。有趣的是,可以将这种对比操作员级联形成更深的结构,并提取更高阶段的显着性,以便于复杂场景更有效。使用5个数据集的最先进方法的比较实验证明了我们框架的有效性。
It is believed that human vision system (HVS) consists of pre-attentive process and attention process when performing salient object detection (SOD). Based on this fact, we propose a four-stage framework for SOD, in which the first two stages match the \textbf{P}re-\textbf{A}ttentive process consisting of general feature extraction (GFE) and feature preprocessing (FP), and the last two stages are corresponding to \textbf{A}ttention process containing saliency feature extraction (SFE) and the feature aggregation (FA), namely \textbf{PAANet}. According to the pre-attentive process, the GFE stage applies the fully-trained backbone and needs no further finetuning for different datasets. This modification can greatly increase the training speed. The FP stage plays the role of finetuning but works more efficiently because of its simpler structure and fewer parameters. Moreover, in SFE stage we design for saliency feature extraction a novel contrast operator, which works more semantically in contrast with the traditional convolution operator when extracting the interactive information between the foreground and its surroundings. Interestingly, this contrast operator can be cascaded to form a deeper structure and extract higher-order saliency more effective for complex scene. Comparative experiments with the state-of-the-art methods on 5 datasets demonstrate the effectiveness of our framework.