标签有效语义细分的对比度学习

论文标题

标签有效语义细分的对比度学习

Contrastive Learning for Label-Efficient Semantic Segmentation

论文作者

Zhao, Xiangyun, Vemulapalli, Raviteja, Mansfield, Philip, Gong, Boqing, Green, Bradley, Shapira, Lior, Wu, Ying

论文摘要

收集标有语义分割任务的标签数据昂贵且耗时，因为它需要密集的像素级注释。尽管最近基于卷积的神经网络（CNN）的语义分割方法通过使用大量标记的训练数据取得了令人印象深刻的结果，但随着标记数据量减少，它们的性能大大下降。之所以发生这种情况，是因为经过事实上的跨透明镜损失训练的深CNN可以很容易地将其过度拟合到少量标记的数据。为了解决这个问题，我们提出了一种简单有效的基于学习的训练策略，在该策略中，我们首先使用像素基于标签的对比损失为网络预处理，然后使用跨凝结损失对其进行微调。这种方法增加了课内的紧凑性和类间的分离性，从而导致更好的像素分类器。我们使用CityScapes和Pascal VOC 2012细分数据集证明了拟议的培训策略的有效性。我们的结果表明，当标记数据的量受到限制时，对所提出的对比损失进行了预读会导致较大的性能增长（在某些设置中的绝对改善超过20％）。在许多情况下，不使用任何其他数据的拟议对比预训练策略能够匹配或胜过广泛使用的Imagenet预处理策略，该策略使用超过一百万个标记的图像。

Collecting labeled data for the task of semantic segmentation is expensive and time-consuming, as it requires dense pixel-level annotations. While recent Convolutional Neural Network (CNN) based semantic segmentation approaches have achieved impressive results by using large amounts of labeled training data, their performance drops significantly as the amount of labeled data decreases. This happens because deep CNNs trained with the de facto cross-entropy loss can easily overfit to small amounts of labeled data. To address this issue, we propose a simple and effective contrastive learning-based training strategy in which we first pretrain the network using a pixel-wise, label-based contrastive loss, and then fine-tune it using the cross-entropy loss. This approach increases intra-class compactness and inter-class separability, thereby resulting in a better pixel classifier. We demonstrate the effectiveness of the proposed training strategy using the Cityscapes and PASCAL VOC 2012 segmentation datasets. Our results show that pretraining with the proposed contrastive loss results in large performance gains (more than 20% absolute improvement in some settings) when the amount of labeled data is limited. In many settings, the proposed contrastive pretraining strategy, which does not use any additional data, is able to match or outperform the widely-used ImageNet pretraining strategy that uses more than a million additional labeled images.

下载PDF全文

下载文献需遵守相关版权规定

论文标题