对标签噪声的鲁棒性取决于特征空间中噪声分布的形状

论文标题

对标签噪声的鲁棒性取决于特征空间中噪声分布的形状

Robustness to Label Noise Depends on the Shape of the Noise Distribution in Feature Space

论文作者

Oyen, Diane, Kucer, Michal, Hengartner, Nick, Singh, Har Simrat

论文摘要

从经验和理论上，已经证明了机器学习分类器在某些条件下标记噪声的稳健性 - 尤其是典型的假设是标签噪声与给定类标签的特征无关。我们提供了一个理论框架，该框架通过将标签噪声建模为特征空间的分布来概括这一典型假设。我们表明，噪声分布的尺度和形状都会影响后的可能性。如果噪声集中在可以移动决策边界的特征空间中，则噪声分布的形状对分类性能有更大的影响。对于均匀标签噪声（独立于功能和类标签）的特殊情况，我们表明，贝叶斯的最佳分类器$ c $类对于标签噪声是可靠的，直到嘈杂样本的比例超过$ \ frac {c-1} {c-1} {c} {C} $（例如，为10个类别的90％），我们呼叫倾斜点。但是，对于依赖类标签的特殊情况（独立于类标签的特征），临界点的临界点可能低至50％。最重要的是，我们表明，当噪声分布针对决策边界（标签噪声直接取决于特征空间）时，分类鲁棒性即使在少量噪声下也可以下降。即使评估最近的标签噪声缓解方法，我们也会看到标签噪声取决于特征时的精度降低。这些发现解释了为什么机器学习通常会在功能空间均匀的情况下很好地处理标签噪声；然而，当它集中在决策边界可以移动的特征空间区域时，它也指出了克服标签噪声的困难。

Machine learning classifiers have been demonstrated, both empirically and theoretically, to be robust to label noise under certain conditions -- notably the typical assumption is that label noise is independent of the features given the class label. We provide a theoretical framework that generalizes beyond this typical assumption by modeling label noise as a distribution over feature space. We show that both the scale and the shape of the noise distribution influence the posterior likelihood; and the shape of the noise distribution has a stronger impact on classification performance if the noise is concentrated in feature space where the decision boundary can be moved. For the special case of uniform label noise (independent of features and the class label), we show that the Bayes optimal classifier for $c$ classes is robust to label noise until the ratio of noisy samples goes above $\frac{c-1}{c}$ (e.g. 90% for 10 classes), which we call the tipping point. However, for the special case of class-dependent label noise (independent of features given the class label), the tipping point can be as low as 50%. Most importantly, we show that when the noise distribution targets decision boundaries (label noise is directly dependent on feature space), classification robustness can drop off even at a small scale of noise. Even when evaluating recent label-noise mitigation methods we see reduced accuracy when label noise is dependent on features. These findings explain why machine learning often handles label noise well if the noise distribution is uniform in feature-space; yet it also points to the difficulty of overcoming label noise when it is concentrated in a region of feature space where a decision boundary can move.

下载PDF全文

下载文献需遵守相关版权规定

论文标题