AuxMix：半监督学习，没有未受到标记的数据

论文标题

AuxMix：半监督学习，没有未受到标记的数据

AuxMix: Semi-Supervised Learning with Unconstrained Unlabeled Data

论文作者

Banitalebi-Dehkordi, Amin, Gujjar, Pratik, Zhang, Yong

论文摘要

半监督学习（SSL）在稀缺标记的数据时取得了长足的进步，但未标记的数据丰富。至关重要的是，最近的工作假设这种未标记的数据是从与标记数据相同的分布中汲取的。在这项工作中，我们表明，在存在未标记的辅助数据的情况下，最新的SSL算法在性能下遭受了降解，这些数据不一定具有与标签集相同的类别分布。我们将此问题称为辅助-SSL，并提出了AuxMix，这是一种利用自我监督的学习任务来学习通用功能，以掩盖与标记的集合在语义上相似的辅助数据。我们还建议通过最大化不同辅助样品的预测熵来正规化学习。当在CIFAR10数据集上使用4K标记的样本进行培训时，我们在Resnet-50型号上显示了5％的改善，所有未标记的数据都来自Tiny-ImageNet数据集。我们报告了几个数据集的竞争结果，并进行消融研究。

Semi-supervised learning (SSL) has seen great strides when labeled data is scarce but unlabeled data is abundant. Critically, most recent work assume that such unlabeled data is drawn from the same distribution as the labeled data. In this work, we show that state-of-the-art SSL algorithms suffer a degradation in performance in the presence of unlabeled auxiliary data that does not necessarily possess the same class distribution as the labeled set. We term this problem as Auxiliary-SSL and propose AuxMix, an algorithm that leverages self-supervised learning tasks to learn generic features in order to mask auxiliary data that are not semantically similar to the labeled set. We also propose to regularize learning by maximizing the predicted entropy for dissimilar auxiliary samples. We show an improvement of 5% over existing baselines on a ResNet-50 model when trained on CIFAR10 dataset with 4k labeled samples and all unlabeled data is drawn from the Tiny-ImageNet dataset. We report competitive results on several datasets and conduct ablation studies.

下载PDF全文

下载文献需遵守相关版权规定

论文标题