完全自我监督的语义细分学习

论文标题

完全自我监督的语义细分学习

Fully Self-Supervised Learning for Semantic Segmentation

论文作者

Wang, Yuan, Zhuo, Wei, Li, Yucong, Wang, Zhi, Ju, Qi, Zhu, Wenwu

论文摘要

在这项工作中，我们提出了一个完全自我监督的语义分割框架（FS^4）。一种完全的语义细分策略，为大量注释节省了努力，对于从端到端构建开放世界域的定制模型至关重要。在现实情况下，该应用程序非常需要。尽管最近的自我监督语义分割方法已经取得了长足的进步，但这些作品在很大程度上取决于完全监督的预定模型，并且使其无法完全自我保护的管道。为了解决这个问题，我们提出了一种启动语义分割的训练计划，该方案完全利用了我们提出的PGG策略和CAE模块的全球语义知识进行自学。特别是，我们对分段监督执行像素聚类和分配。防止它聚集一团糟，我们提出了1）金字塔 - 球形引导（PGG）培训策略，以使用金字塔图像/斑块级伪标签来监督学习，该标签是通过对无监督的特征进行分组而产生的。稳定的全球和金字塔语义伪标签可以防止分割学分段学习太多的混乱区域或退化为一个背景区域。 2）此外，我们提出了上下文感知的嵌入（CAE）模块，以生成全局特征嵌入的嵌入，鉴于其邻居以非平凡的方式关闭了空间和外观。我们在大规模的可可固定数据集上评估了我们的方法，并在事物和东西对象上取得了7.19 MIOU的改进

In this work, we present a fully self-supervised framework for semantic segmentation(FS^4). A fully bootstrapped strategy for semantic segmentation, which saves efforts for the huge amount of annotation, is crucial for building customized models from end-to-end for open-world domains. This application is eagerly needed in realistic scenarios. Even though recent self-supervised semantic segmentation methods have gained great progress, these works however heavily depend on the fully-supervised pretrained model and make it impossible a fully self-supervised pipeline. To solve this problem, we proposed a bootstrapped training scheme for semantic segmentation, which fully leveraged the global semantic knowledge for self-supervision with our proposed PGG strategy and CAE module. In particular, we perform pixel clustering and assignments for segmentation supervision. Preventing it from clustering a mess, we proposed 1) a pyramid-global-guided (PGG) training strategy to supervise the learning with pyramid image/patch-level pseudo labels, which are generated by grouping the unsupervised features. The stable global and pyramid semantic pseudo labels can prevent the segmentation from learning too many clutter regions or degrading to one background region; 2) in addition, we proposed context-aware embedding (CAE) module to generate global feature embedding in view of its neighbors close both in space and appearance in a non-trivial way. We evaluate our method on the large-scale COCO-Stuff dataset and achieved 7.19 mIoU improvements on both things and stuff objects

下载PDF全文

下载文献需遵守相关版权规定

论文标题