论文标题
扩展T:用混合封闭式和开放式嘈杂标签学习
Extended T: Learning with Mixed Closed-set and Open-set Noisy Labels
论文作者
论文摘要
标签噪声过渡矩阵$ t $反映了True标签变成嘈杂的概率,对于建模标签噪声和设计统计上一致的分类器至关重要。传统的过渡矩阵仅限于模型闭合标签噪声,在嘈杂的标签集中,嘈杂的训练数据具有真正的类标签。使用这样的过渡矩阵来建模开放式标签噪声是不合理的,其中一些真实的类标签在嘈杂的标签集外。因此,在考虑更现实的情况时,即发生闭合设置和开放设定标签噪声时,现有方法将不必要地提供偏见的解决方案。此外,传统的过渡矩阵仅限于模型独立的标签噪声,在实践中可能表现不佳。在本文中,我们专注于在混合封闭式和开放式标签噪声下学习。我们通过扩展传统的过渡矩阵以能够对混合标签噪声进行建模,并进一步与群集依赖性的过渡矩阵来解决上述问题,以更好地近似于现实世界应用中的实例依赖性标签噪声。我们将提出的过渡矩阵称为群集依赖的扩展跃迁矩阵。无偏的估计器(即扩展$ t $估计器)旨在通过仅利用嘈杂的数据来估算群集依赖的扩展过渡矩阵。综合合成和真实的实验验证了我们的方法可以更好地对混合标签噪声进行建模,遵循其更强的性能比先前的最新标签 - 噪声学习方法。
The label noise transition matrix $T$, reflecting the probabilities that true labels flip into noisy ones, is of vital importance to model label noise and design statistically consistent classifiers. The traditional transition matrix is limited to model closed-set label noise, where noisy training data has true class labels within the noisy label set. It is unfitted to employ such a transition matrix to model open-set label noise, where some true class labels are outside the noisy label set. Thus when considering a more realistic situation, i.e., both closed-set and open-set label noise occurs, existing methods will undesirably give biased solutions. Besides, the traditional transition matrix is limited to model instance-independent label noise, which may not perform well in practice. In this paper, we focus on learning under the mixed closed-set and open-set label noise. We address the aforementioned issues by extending the traditional transition matrix to be able to model mixed label noise, and further to the cluster-dependent transition matrix to better approximate the instance-dependent label noise in real-world applications. We term the proposed transition matrix as the cluster-dependent extended transition matrix. An unbiased estimator (i.e., extended $T$-estimator) has been designed to estimate the cluster-dependent extended transition matrix by only exploiting the noisy data. Comprehensive synthetic and real experiments validate that our method can better model the mixed label noise, following its more robust performance than the prior state-of-the-art label-noise learning methods.