用于弱半监督对象检测的组R-CNN

论文标题

用于弱半监督对象检测的组R-CNN

Group R-CNN for Weakly Semi-supervised Object Detection with Points

论文作者

Zhang, Shilong, Yu, Zhuoran, Liu, Liyang, Wang, Xinjiang, Zhou, Aojun, Chen, Kai

论文摘要

我们研究了用点（WSSOD-P）进行弱半监督对象检测的问题，其中训练数据由一小部分带有边界框的完全注释的图像和一组弱标记的图像组合在一起，每个实例只有一个点注释。该任务的核心是在标记良好的图像上训练点对点回归器，该图像可用于预测每个点注释的可靠边界框。我们挑战了以前的信念，即现有的基于CNN的检测器与此任务不兼容。根据经典的R-CNN体系结构，我们提出了一个有效的点对点回归仪：组R-CNN。 R-CNN组首先使用实例级建议分组来为每个点注释生成一组建议，因此可以获得高召回率。为了更好地区分不同的实例并提高了精度，我们建议实例级建议分配，以替换原始R-CNN方法中采用的香草分配策略。随着幼稚的实例级分配带来了融合的难度，我们提出了实例感知表示学习，其中包括实例感知功能增强功能和实例感知参数的生成，以克服此问题。 MS-Coco基准的全面实验证明了我们方法的有效性。具体而言，组R-CNN的表现明显优于先前的方法点DITR，而3.9映射具有5％标记的图像，这是最具挑战性的情况。可以在https://github.com/jshilong/grouprcnn上找到源代码

We study the problem of weakly semi-supervised object detection with points (WSSOD-P), where the training data is combined by a small set of fully annotated images with bounding boxes and a large set of weakly-labeled images with only a single point annotated for each instance. The core of this task is to train a point-to-box regressor on well-labeled images that can be used to predict credible bounding boxes for each point annotation. We challenge the prior belief that existing CNN-based detectors are not compatible with this task. Based on the classic R-CNN architecture, we propose an effective point-to-box regressor: Group R-CNN. Group R-CNN first uses instance-level proposal grouping to generate a group of proposals for each point annotation and thus can obtain a high recall rate. To better distinguish different instances and improve precision, we propose instance-level proposal assignment to replace the vanilla assignment strategy adopted in the original R-CNN methods. As naive instance-level assignment brings converging difficulty, we propose instance-aware representation learning which consists of instance-aware feature enhancement and instance-aware parameter generation to overcome this issue. Comprehensive experiments on the MS-COCO benchmark demonstrate the effectiveness of our method. Specifically, Group R-CNN significantly outperforms the prior method Point DETR by 3.9 mAP with 5% well-labeled images, which is the most challenging scenario. The source code can be found at https://github.com/jshilong/GroupRCNN

下载PDF全文

下载文献需遵守相关版权规定

论文标题