论文标题
半监督学习中的公平限制
Fairness Constraints in Semi-supervised Learning
论文作者
论文摘要
机器学习中的公平已受到了很大的关注。但是,大多数关于公平学习的研究都集中在监督学习或无监督的学习上。很少有人考虑半监督的设置。但是,实际上,大多数机器学习任务都依赖于包含标记和未标记数据的大型数据集。公平学习的关键问题之一是公平与准确性之间的平衡。先前的研究认为,增加训练集的规模可以更好地折衷。我们认为,增加未标记数据的培训集可能会达到相似的结果。因此,我们为公平半监督学习的框架开发了一个框架,该学习被称为优化问题。这包括分类器损失,以优化准确性,标签传播损失以优化无效数据预测以及对标记和未标记数据的公平约束,以优化公平水平。该框架是在逻辑回归和支持向量机器中进行的,这是在不同的影响和不同虐待的公平指标下进行的。我们从理论上通过偏见,方差和噪声分解来分析半监督学习中的歧视来源。广泛的实验表明,我们的方法能够实现公平的半监督学习,并且在准确性和公平性之间要比公平监督的学习实现更好的权衡。
Fairness in machine learning has received considerable attention. However, most studies on fair learning focus on either supervised learning or unsupervised learning. Very few consider semi-supervised settings. Yet, in reality, most machine learning tasks rely on large datasets that contain both labeled and unlabeled data. One of key issues with fair learning is the balance between fairness and accuracy. Previous studies arguing that increasing the size of the training set can have a better trade-off. We believe that increasing the training set with unlabeled data may achieve the similar result. Hence, we develop a framework for fair semi-supervised learning, which is formulated as an optimization problem. This includes classifier loss to optimize accuracy, label propagation loss to optimize unlabled data prediction, and fairness constraints over labeled and unlabeled data to optimize the fairness level. The framework is conducted in logistic regression and support vector machines under the fairness metrics of disparate impact and disparate mistreatment. We theoretically analyze the source of discrimination in semi-supervised learning via bias, variance and noise decomposition. Extensive experiments show that our method is able to achieve fair semi-supervised learning, and reach a better trade-off between accuracy and fairness than fair supervised learning.