论文标题

与实例有关的标签噪声的二阶学习方法

A Second-Order Approach to Learning with Instance-Dependent Label Noise

论文作者

Zhu, Zhaowei, Liu, Tongliang, Liu, Yang

论文摘要

标签噪声的存在通常会误导深度神经网络的训练。与最近假设标签噪声率的最新文献背道而驰,仅由真实标签类别确定,人类通知标签的错误更有可能取决于任务的难度级别,从而导致具有实例依赖性标签噪声的设置。我们首先提供的证据表明,异质实例依赖性标签噪声有效地以不均匀的方式以较高的噪声速率减少了示例,从而导致失衡,从而使直接应用依赖类依赖性标签噪声的方法的策略提出了问题。基于最近的工作同行损失[24],我们然后提出并研究了二阶方法的潜力,该方法利用了实例依赖性噪声速率和贝叶斯最佳标签之间定义的几个协方差项的估计。我们表明,这组二阶统计数据成功捕获了诱发的失衡。我们进一步表明,借助估计的二阶统计数据,我们确定了一个新的损失函数,其在实例依赖性标签噪声下的分类器的预期风险等同于仅具有类依赖类标签噪声的新问题。这一事实使我们能够应用现有的解决方案来处理这种研究的设置。我们提供了一个有效的程序来估算这些二阶统计数据,而无需访问地面真相标签或噪声率的先验知识。具有合成实例依赖性标签噪声和服装1M的CIFAR10和CIFAR100的实验1M具有现实世界标签噪声可验证我们的方法。我们的实施可从https://github.com/ucsc-real/cal获得。

The presence of label noise often misleads the training of deep neural networks. Departing from the recent literature which largely assumes the label noise rate is only determined by the true label class, the errors in human-annotated labels are more likely to be dependent on the difficulty levels of tasks, resulting in settings with instance-dependent label noise. We first provide evidences that the heterogeneous instance-dependent label noise is effectively down-weighting the examples with higher noise rates in a non-uniform way and thus causes imbalances, rendering the strategy of directly applying methods for class-dependent label noise questionable. Built on a recent work peer loss [24], we then propose and study the potentials of a second-order approach that leverages the estimation of several covariance terms defined between the instance-dependent noise rates and the Bayes optimal label. We show that this set of second-order statistics successfully captures the induced imbalances. We further proceed to show that with the help of the estimated second-order statistics, we identify a new loss function whose expected risk of a classifier under instance-dependent label noise is equivalent to a new problem with only class-dependent label noise. This fact allows us to apply existing solutions to handle this better-studied setting. We provide an efficient procedure to estimate these second-order statistics without accessing either ground truth labels or prior knowledge of the noise rates. Experiments on CIFAR10 and CIFAR100 with synthetic instance-dependent label noise and Clothing1M with real-world human label noise verify our approach. Our implementation is available at https://github.com/UCSC-REAL/CAL.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源