PIPA：针对域适应性语义分段的像素和贴剂的自我监督学习

论文标题

PIPA：针对域适应性语义分段的像素和贴剂的自我监督学习

PiPa: Pixel- and Patch-wise Self-supervised Learning for Domain Adaptative Semantic Segmentation

论文作者

Chen, Mu, Zheng, Zhedong, Yang, Yi, Chua, Tat-Seng

论文摘要

无监督的域适应性（UDA）旨在增强学习模型对其他领域的概括。域不变知识从在标记的源域（例如视频游戏）上训练的模型转移到未标记的目标域，例如现实世界情景，保存注释费用。现有的语义分割的UDA方法通常集中于最大程度地减少各个级别的域间差异，例如像素，特征和预测，用于提取域不变知识。但是，主要的内域知识（例如图像内部的上下文相关）仍然没有被逐渐解散。为了填补这一空白，我们提出了一个统一的像素和贴片的自我监督学习框架，称为PIPA，用于域自适应的语义分割，以促进图像内图像的像素智慧的相关性，并在不同背景下进行贴片的语义一致性。所提出的框架利用了域内图像的固有结构，该框架的固有结构：（1）明确鼓励学习具有阶级的紧凑性和类间分离性的区分性像素特征，以及（2）激励相同的贴片对不同上下文或波动的相同特征学习。广泛的实验验证了所提出的方法的有效性，该方法获得了两个广泛使用的UDA基准测试的竞争精度，即GTA到CityScapes上的75.6 MIOU，在Synthia到CityScapes的Synthia上获得了68.2 MIOU。此外，我们的方法与其他UDA方法兼容，可以进一步提高性能，而无需引入额外的参数。

Unsupervised Domain Adaptation (UDA) aims to enhance the generalization of the learned model to other domains. The domain-invariant knowledge is transferred from the model trained on labeled source domain, e.g., video game, to unlabeled target domains, e.g., real-world scenarios, saving annotation expenses. Existing UDA methods for semantic segmentation usually focus on minimizing the inter-domain discrepancy of various levels, e.g., pixels, features, and predictions, for extracting domain-invariant knowledge. However, the primary intra-domain knowledge, such as context correlation inside an image, remains underexplored. In an attempt to fill this gap, we propose a unified pixel- and patch-wise self-supervised learning framework, called PiPa, for domain adaptive semantic segmentation that facilitates intra-image pixel-wise correlations and patch-wise semantic consistency against different contexts. The proposed framework exploits the inherent structures of intra-domain images, which: (1) explicitly encourages learning the discriminative pixel-wise features with intra-class compactness and inter-class separability, and (2) motivates the robust feature learning of the identical patch against different contexts or fluctuations. Extensive experiments verify the effectiveness of the proposed method, which obtains competitive accuracy on the two widely-used UDA benchmarks, i.e., 75.6 mIoU on GTA to Cityscapes and 68.2 mIoU on Synthia to Cityscapes. Moreover, our method is compatible with other UDA approaches to further improve the performance without introducing extra parameters.

下载PDF全文

下载文献需遵守相关版权规定

论文标题