论文标题
自我过滤:噪音吸引的样本选择标签噪声,并受到信心惩罚
Self-Filtering: A Noise-Aware Sample Selection for Label Noise with Confidence Penalization
论文作者
论文摘要
样品选择是减轻标签噪声在鲁棒学习中的影响的有效策略。典型的策略通常应用小损坏标准以识别干净的样品。但是,这些样本位于决策边界周围,巨大的损失通常与嘈杂的例子纠缠在一起,这将被此标准丢弃,从而导致概括性能的严重退化。在本文中,我们提出了一种新颖的选择策略,即\ textbf {s} elf- \ textbf {f} il \ textbf {t} ering(sft),它利用历史预测中嘈杂的示例的波动来过滤它们,这可以避免对边界示例的小元素进行选择的选择偏见。具体来说,我们介绍了一个存储库模块,该模块存储每个示例的历史预测并动态更新以支持随后的学习迭代的选择。此外,为了减少SFT样本选择偏置的累积错误,我们设计一个正规化术语来惩罚自信的输出分布。通过通过此术语增加错误分类类别的重量,损失函数在轻度条件下标记噪声是可靠的。我们对具有变化噪声类型的三个基准测试进行了广泛的实验,并实现了新的最新实验。消融研究和进一步分析验证了SFT在鲁棒学习中选择样本的优点。
Sample selection is an effective strategy to mitigate the effect of label noise in robust learning. Typical strategies commonly apply the small-loss criterion to identify clean samples. However, those samples lying around the decision boundary with large losses usually entangle with noisy examples, which would be discarded with this criterion, leading to the heavy degeneration of the generalization performance. In this paper, we propose a novel selection strategy, \textbf{S}elf-\textbf{F}il\textbf{t}ering (SFT), that utilizes the fluctuation of noisy examples in historical predictions to filter them, which can avoid the selection bias of the small-loss criterion for the boundary examples. Specifically, we introduce a memory bank module that stores the historical predictions of each example and dynamically updates to support the selection for the subsequent learning iteration. Besides, to reduce the accumulated error of the sample selection bias of SFT, we devise a regularization term to penalize the confident output distribution. By increasing the weight of the misclassified categories with this term, the loss function is robust to label noise in mild conditions. We conduct extensive experiments on three benchmarks with variant noise types and achieve the new state-of-the-art. Ablation studies and further analysis verify the virtue of SFT for sample selection in robust learning.