集中特征金字塔用于对象检测

论文标题

集中特征金字塔用于对象检测

Centralized Feature Pyramid for Object Detection

论文作者

Quan, Yu, Zhang, Dong, Zhang, Liyan, Tang, Jinhui

论文摘要

视觉特征金字塔在广泛的应用中显示出其在有效性和效率方面的优势。但是，现有的方法非常集中于层间特征相互作用，但忽略了层内特征法规，这在经验上被证明是有益的。尽管某些方法试图在注意机制或视觉变压器的帮助下学习紧凑的层内特征表示，但它们忽略了被忽视的角区域，这些区域对于密集的预测任务很重要。为了解决这个问题，在本文中，我们提出了一个集中式特征金字塔（CFP），用于对象检测，该金字塔（CFP）基于全球明确的集中特征调节。具体而言，我们首先提出了一个空间显式视觉中心方案，其中使用轻质的MLP来捕获全球长距离依赖性，并使用可行的可学习视觉中心机制来捕获输入图像的本地角区域。然后，我们以自上而下的方式提出了一个全球集中的针对普通特征金字塔的调节，其中使用了从最深的内部内部特征获得的明确视觉中心信息来调节额叶浅的特征。与现有的特征金字塔相比，CFP不仅具有捕获全局远程依赖性的能力，而且还可以有效地获得全方位但歧视性的特征表示。关于具有挑战性的MS-Coco验证我们提出的CFP可以实现最先进的Yolov5和Yolox对象检测基线的一致性增长的实验结果。

Visual feature pyramid has shown its superiority in both effectiveness and efficiency in a wide range of applications. However, the existing methods exorbitantly concentrate on the inter-layer feature interactions but ignore the intra-layer feature regulations, which are empirically proved beneficial. Although some methods try to learn a compact intra-layer feature representation with the help of the attention mechanism or the vision transformer, they ignore the neglected corner regions that are important for dense prediction tasks. To address this problem, in this paper, we propose a Centralized Feature Pyramid (CFP) for object detection, which is based on a globally explicit centralized feature regulation. Specifically, we first propose a spatial explicit visual center scheme, where a lightweight MLP is used to capture the globally long-range dependencies and a parallel learnable visual center mechanism is used to capture the local corner regions of the input images. Based on this, we then propose a globally centralized regulation for the commonly-used feature pyramid in a top-down fashion, where the explicit visual center information obtained from the deepest intra-layer feature is used to regulate frontal shallow features. Compared to the existing feature pyramids, CFP not only has the ability to capture the global long-range dependencies, but also efficiently obtain an all-round yet discriminative feature representation. Experimental results on the challenging MS-COCO validate that our proposed CFP can achieve the consistent performance gains on the state-of-the-art YOLOv5 and YOLOX object detection baselines.

下载PDF全文

下载文献需遵守相关版权规定

论文标题