重新注意力转换器用于弱监督物体本地化

论文标题

重新注意力转换器用于弱监督物体本地化

Re-Attention Transformer for Weakly Supervised Object Localization

论文作者

Su, Hui, Ye, Yue, Chen, Zhiwei, Song, Mingli, Cheng, Lechao

论文摘要

弱监督的对象本地化是一项具有挑战性的任务，旨在将对象定位具有粗糙注释（例如图像类别）。现有的深网方法主要基于类激活图，该图的重点是突出歧视性局部区域，同时忽略了整个对象。此外，基于变压器的技术不断地重点放在阻碍识别完整对象的能力的背景上。为了解决这些问题，我们提出了一种称为令牌改进变压器（TRT）的重新注意事项机制，该机制捕获了对象级语义，以很好地指导本地化。具体而言，TRT引入了一个名为令牌优先级评分模块（TPSM）的新型模块，以抑制背景噪声的效果，同时重点放在目标对象上。然后，我们将类激活图作为语义意识的输入结合在一起，以将注意力图限制为目标对象。在两个基准上进行的广泛实验展示了我们提出的方法与具有图像类别注释的现有方法的优越性。源代码可在\ url {https://github.com/su-hui-zz/reattentiontransformer}中获得。

Weakly supervised object localization is a challenging task which aims to localize objects with coarse annotations such as image categories. Existing deep network approaches are mainly based on class activation map, which focuses on highlighting discriminative local region while ignoring the full object. In addition, the emerging transformer-based techniques constantly put a lot of emphasis on the backdrop that impedes the ability to identify complete objects. To address these issues, we present a re-attention mechanism termed token refinement transformer (TRT) that captures the object-level semantics to guide the localization well. Specifically, TRT introduces a novel module named token priority scoring module (TPSM) to suppress the effects of background noise while focusing on the target object. Then, we incorporate the class activation map as the semantically aware input to restrain the attention map to the target object. Extensive experiments on two benchmarks showcase the superiority of our proposed method against existing methods with image category annotations. Source code is available in \url{https://github.com/su-hui-zz/ReAttentionTransformer}.

下载PDF全文

下载文献需遵守相关版权规定

论文标题