论文标题
嘈杂标签标识和校正的邻里集体估计
Neighborhood Collective Estimation for Noisy Label Identification and Correction
论文作者
论文摘要
使用嘈杂标签(LNL)学习旨在设计策略,以减轻模型过于适应嘈杂标签的影响来改善模型性能和概括。 LNL的关键成功在于从大量嘈杂数据中识别尽可能多的干净样本,同时纠正错误分配的嘈杂标签。最近的进步采用了单个样品的预测标签分布来执行噪声验证和嘈杂的标签校正,很容易产生确认偏差。为了减轻此问题,我们提出了社区集体估计,其中通过将其与其功能空间最近的邻居进行对比,重新估计了候选样本的预测性可靠性。具体而言,我们的方法分为两个步骤:1)邻里集体噪声验证,将所有训练样品分为干净或嘈杂的子集,2)邻里集体标签校正以进行Relabel嘈杂样本,然后使用辅助技术来帮助进一步的模型优化。在四个常用基准数据集(即CIFAR-10,CIFAR-100,Clothing-1M和WebVision-1.0)上进行了广泛的实验,这表明我们所提出的方法的最优于最先进的方法。
Learning with noisy labels (LNL) aims at designing strategies to improve model performance and generalization by mitigating the effects of model overfitting to noisy labels. The key success of LNL lies in identifying as many clean samples as possible from massive noisy data, while rectifying the wrongly assigned noisy labels. Recent advances employ the predicted label distributions of individual samples to perform noise verification and noisy label correction, easily giving rise to confirmation bias. To mitigate this issue, we propose Neighborhood Collective Estimation, in which the predictive reliability of a candidate sample is re-estimated by contrasting it against its feature-space nearest neighbors. Specifically, our method is divided into two steps: 1) Neighborhood Collective Noise Verification to separate all training samples into a clean or noisy subset, 2) Neighborhood Collective Label Correction to relabel noisy samples, and then auxiliary techniques are used to assist further model optimization. Extensive experiments on four commonly used benchmark datasets, i.e., CIFAR-10, CIFAR-100, Clothing-1M and Webvision-1.0, demonstrate that our proposed method considerably outperforms state-of-the-art methods.