论文标题
重新思考本地化图:使用自我增强图的准确对象感知
Rethinking Localization Map: Towards Accurate Object Perception with Self-Enhancement Maps
论文作者
论文摘要
最近,在弱监督的对象定位(WSOL)中取得了显着进展,以促进对象定位图。评估这些地图的常见做法采用间接和粗糙的方式,即获得可以覆盖高激活区域并计算预测盒和地面真实盒之间的相交(IOU)分数的紧密边界框。该测量值可以在一定程度上评估定位图的能力,但是我们认为应直接和精细地测量地图,即将映射与地面真实对象掩盖pixel-wise进行比较。为了实现直接评估,我们注释ILSVRC验证集上的像素级对象掩码。我们建议使用阈值曲线来评估本地化图的真实质量。除了经过修改的评估指标和注释对象掩模之外,这项工作还引入了一种新型的自我增强方法,以收获准确的对象定位图和仅具有类别标签作为监督的对象边界。我们提出了一种两阶段的方法来生成本地化图,通过简单地比较高激活和休息像素之间的点特征的相似性。根据预测的本地化图,我们探索以估算一个非常大的数据集上的对象边界。提出了用于获得良好边界的硬性抑制损失。我们对ILSVRC和CUB基准进行了广泛的实验。特别是,提议的自我增强图在ILSVRC上达到了54.88%的最新定位精度。代码和带注释的掩码在https://github.com/xiaomengyc/sem上发布。
Recently, remarkable progress has been made in weakly supervised object localization (WSOL) to promote object localization maps. The common practice of evaluating these maps applies an indirect and coarse way, i.e., obtaining tight bounding boxes which can cover high-activation regions and calculating intersection-over-union (IoU) scores between the predicted and ground-truth boxes. This measurement can evaluate the ability of localization maps to some extent, but we argue that the maps should be measured directly and delicately, i.e., comparing the maps with the ground-truth object masks pixel-wisely. To fulfill the direct evaluation, we annotate pixel-level object masks on the ILSVRC validation set. We propose to use IoU-Threshold curves for evaluating the real quality of localization maps. Beyond the amended evaluation metric and annotated object masks, this work also introduces a novel self-enhancement method to harvest accurate object localization maps and object boundaries with only category labels as supervision. We propose a two-stage approach to generate the localization maps by simply comparing the similarity of point-wise features between the high-activation and the rest pixels. Based on the predicted localization maps, we explore to estimate object boundaries on a very large dataset. A hard-negative suppression loss is proposed for obtaining fine boundaries. We conduct extensive experiments on the ILSVRC and CUB benchmarks. In particular, the proposed Self-Enhancement Maps achieve the state-of-the-art localization accuracy of 54.88% on ILSVRC. The code and the annotated masks are released at https://github.com/xiaomengyc/SEM.