论文标题
分配伪标签的炼油厂,用于半监督学习不平衡的学习
Distribution Aligning Refinery of Pseudo-label for Imbalanced Semi-supervised Learning
论文作者
论文摘要
虽然半监督学习(SSL)已被证明是在稀缺标记数据时利用未标记数据的一种有希望的方法,但现有的SSL算法通常假设培训类别分布是平衡的。但是,在概括到平衡的测试标准时,这些在不平衡的类分布中训练的SSL算法可能会严重遭受损失,因为它们利用了偏见的未标记数据的伪标记为多数类别。为了减轻这个问题,我们制定了一个凸优化问题,以软化从偏置模型产生的伪标记,并开发出一种简单的算法,称为伪标签(DARP)的炼油厂(DARP)的分布分布,从而证明了它的证明和有效。在各种类不平衡的半监督场景下,我们证明了DARP的有效性及其与最先进的SSL方案的兼容性。
While semi-supervised learning (SSL) has proven to be a promising way for leveraging unlabeled data when labeled data is scarce, the existing SSL algorithms typically assume that training class distributions are balanced. However, these SSL algorithms trained under imbalanced class distributions can severely suffer when generalizing to a balanced testing criterion, since they utilize biased pseudo-labels of unlabeled data toward majority classes. To alleviate this issue, we formulate a convex optimization problem to softly refine the pseudo-labels generated from the biased model, and develop a simple algorithm, named Distribution Aligning Refinery of Pseudo-label (DARP) that solves it provably and efficiently. Under various class-imbalanced semi-supervised scenarios, we demonstrate the effectiveness of DARP and its compatibility with state-of-the-art SSL schemes.