位置的像素 - 语义修订，学习一个具有共享编码器的单级对象检测器

论文标题

位置的像素 - 语义修订，学习一个具有共享编码器的单级对象检测器

Pixel-Semantic Revise of Position Learning A One-Stage Object Detector with A Shared Encoder-Decoder

论文作者

Li, Qian, Guo, Nan, Ye, Xiaochun, Fan, Dongrui, Tang, Zhimin

论文摘要

最近，已经提出了许多用于对象检测的方法。他们无法自适应地通过语义特征检测对象。在这项工作中，根据渠道和空间注意机制，我们主要分析不同方法可适应对象。一些最先进的探测器将不同的特征金字塔与许多机制相结合，以增强多层次的语义信息。但是，它们需要更多的成本。这项工作解决了带有关注机制的无锚检测器，带有共享的编码器，提取共享特征。我们将主链（例如Resnet-50）不同级别的特征视为基础特征。然后，我们将功能馈送到一个简单的模块中，然后将检测器标头用于检测对象。同时，我们使用语义特征来修改几何位置，而检测器是位置的像素 - 语义修订。更重要的是，这项工作分析了不同的合并策略（例如，平均值，最大或最小值）对多尺度对象的影响，并发现最小的合并改善了对小对象的检测性能。与基于标准MSCOCO 2014基线的RESNET-101的最先进的跨国公司相比，我们的方法将检测AP提高了3.8％。

Recently, many methods have been proposed for object detection. They cannot detect objects by semantic features, adaptively. In this work, according to channel and spatial attention mechanisms, we mainly analyze that different methods detect objects adaptively. Some state-of-the-art detectors combine different feature pyramids with many mechanisms to enhance multi-level semantic information. However, they require more cost. This work addresses that by an anchor-free detector with shared encoder-decoder with attention mechanism, extracting shared features. We consider features of different levels from backbone (e.g., ResNet-50) as the basis features. Then, we feed the features into a simple module, followed by a detector header to detect objects. Meantime, we use the semantic features to revise geometric locations, and the detector is a pixel-semantic revising of position. More importantly, this work analyzes the impact of different pooling strategies (e.g., mean, maximum or minimum) on multi-scale objects, and finds the minimum pooling improve detection performance on small objects better. Compared with state-of-the-art MNC based on ResNet-101 for the standard MSCOCO 2014 baseline, our method improves detection AP of 3.8%.

下载PDF全文

下载文献需遵守相关版权规定

论文标题