论文标题
三方:通过更精确的分区来解决嘈杂的标签
Tripartite: Tackle Noisy Labels by a More Precise Partition
论文作者
论文摘要
大规模数据集中的样本可能由于各种原因而被错误标记,并且深层神经网络很容易与嘈杂的标签数据相处。为了解决这个问题,关键是减轻这些嘈杂标签的危害。许多现有的方法试图将培训数据分为损失值的干净和嘈杂的子集,然后处理嘈杂的标签数据变化。阻碍更好性能的原因之一是硬样品。由于硬样品的标签始终具有相对较大的损失,无论其标签是干净的还是嘈杂的,因此这些方法无法精确地将它们划分。取而代之的是,我们提出了一种三方解决方案,将训练数据更精确地分为三个子集:硬,嘈杂和干净。分区标准基于两个网络的不一致预测,以及网络预测与给定标签之间的不一致性。为了最大程度地减少嘈杂标签的危害,但最大程度地提高了嘈杂标签数据的价值,我们在不使用给定标签的情况下对硬数据进行低重量学习和在嘈杂的标签数据上进行自我监督的学习。广泛的实验表明,三方可以更精确地滤除嘈杂的标签数据,并且在五个基准数据集上的大多数最新方法都优于大多数最新方法,尤其是在现实世界中的数据集上。
Samples in large-scale datasets may be mislabeled due to various reasons, and Deep Neural Networks can easily over-fit to the noisy label data. To tackle this problem, the key point is to alleviate the harm of these noisy labels. Many existing methods try to divide training data into clean and noisy subsets in terms of loss values, and then process the noisy label data varied. One of the reasons hindering a better performance is the hard samples. As hard samples always have relatively large losses whether their labels are clean or noisy, these methods could not divide them precisely. Instead, we propose a Tripartite solution to partition training data more precisely into three subsets: hard, noisy, and clean. The partition criteria are based on the inconsistent predictions of two networks, and the inconsistency between the prediction of a network and the given label. To minimize the harm of noisy labels but maximize the value of noisy label data, we apply a low-weight learning on hard data and a self-supervised learning on noisy label data without using the given labels. Extensive experiments demonstrate that Tripartite can filter out noisy label data more precisely, and outperforms most state-of-the-art methods on five benchmark datasets, especially on real-world datasets.