公正的监督对比学习

论文标题

公正的监督对比学习

Unbiased Supervised Contrastive Learning

论文作者

Barbano, Carlo Alberto, Dufumier, Benoit, Tartaglione, Enzo, Grangetto, Marco, Gori, Pietro

论文摘要

许多数据集都是有偏见的，即它们包含易于学习的功能，这些功能仅与数据集中的目标类高度相关，而不是在数据的真实基础分布中。因此，从偏见的数据中学习无偏模型已成为过去几年中非常相关的研究主题。在这项工作中，我们解决了对偏见强大的学习表征的问题。我们首先提出了一个基于保证金的理论框架，该框架使我们能够澄清为什么在处理有偏见的数据时，最近的对比损失（Infonce，SupCon等）可能会失败。基于这一点，我们得出了一种新颖的对比损失（Epsilon-Supinfonce）的新表述，从而更准确地控制了正面和负样本之间的最小距离。此外，由于我们的理论框架，我们还提出了Fairkl，这是一种新的正规化损失，即使数据非常有偏见，它也可以很好地工作。我们验证了包括CIFAR10，CIFAR100和IMAGENET在内的标准视觉数据集中提出的损失，并评估Fairkl与Epsilon-Supinfonce的偏见能力，在许多偏见的数据集中达到了最先进的表现，包括野外偏见的真实实例。

Many datasets are biased, namely they contain easy-to-learn features that are highly correlated with the target class only in the dataset but not in the true underlying distribution of the data. For this reason, learning unbiased models from biased data has become a very relevant research topic in the last years. In this work, we tackle the problem of learning representations that are robust to biases. We first present a margin-based theoretical framework that allows us to clarify why recent contrastive losses (InfoNCE, SupCon, etc.) can fail when dealing with biased data. Based on that, we derive a novel formulation of the supervised contrastive loss (epsilon-SupInfoNCE), providing more accurate control of the minimal distance between positive and negative samples. Furthermore, thanks to our theoretical framework, we also propose FairKL, a new debiasing regularization loss, that works well even with extremely biased data. We validate the proposed losses on standard vision datasets including CIFAR10, CIFAR100, and ImageNet, and we assess the debiasing capability of FairKL with epsilon-SupInfoNCE, reaching state-of-the-art performance on a number of biased datasets, including real instances of biases in the wild.

下载PDF全文

下载文献需遵守相关版权规定

论文标题