公平意识的幼稚贝叶斯分类器，用于具有多个敏感特征的数据

论文标题

公平意识的幼稚贝叶斯分类器，用于具有多个敏感特征的数据

Fairness-Aware Naive Bayes Classifier for Data with Multiple Sensitive Features

论文作者

Boulitsakis-Logothetis, Stelios

论文摘要

公平感知的机器学习旨在最大程度地提高实用性，以产生预测，同时避免基于诸如种族，性别，宗教等敏感属性的不公平歧视。在该领域的重要工作中，在分类器的培训步骤中，一项重要的工作是在执行公平。遵循此策略的一种简单而有效的二进制分类算法是两分 - bayes（2NB），该算法实施统计奇偶性 - 要求组成数据集的组具有相同可能性的正面标签。在本文中，我们将该算法推广到N-Naive-Bayes（NNB）中，以消除假设数据中仅两个敏感组的简化，而是将其应用于任意数量的组。我们提出了原始算法的统计奇偶校验约束和实施标签统计独立性和单个敏感属性的后处理程序的扩展。然后，我们研究了其在具有多个敏感特征的数据上的应用程序，并提出了一个新的约束和后处理程序来实施差异公平，这是建立的集体 - 财产约束的扩展，这些约束集中在交叉点上。我们从经验上证明了NNB算法对美国人口普查数据集的有效性，并比较了其准确性和伪造性能，这是通过不同的影响和DF-$ε$得分来衡量的，具有相似的群体 - fairness算法。最后，在将此算法纳入其应用程序之前，我们应提出重要的考虑，并将其指示他们将使用统计奇偶校验作为公平标准的利弊，缺点和道德含义进一步阅读。

Fairness-aware machine learning seeks to maximise utility in generating predictions while avoiding unfair discrimination based on sensitive attributes such as race, sex, religion, etc. An important line of work in this field is enforcing fairness during the training step of a classifier. A simple yet effective binary classification algorithm that follows this strategy is two-naive-Bayes (2NB), which enforces statistical parity - requiring that the groups comprising the dataset receive positive labels with the same likelihood. In this paper, we generalise this algorithm into N-naive-Bayes (NNB) to eliminate the simplification of assuming only two sensitive groups in the data and instead apply it to an arbitrary number of groups. We propose an extension of the original algorithm's statistical parity constraint and the post-processing routine that enforces statistical independence of the label and the single sensitive attribute. Then, we investigate its application on data with multiple sensitive features and propose a new constraint and post-processing routine to enforce differential fairness, an extension of established group-fairness constraints focused on intersectionalities. We empirically demonstrate the effectiveness of the NNB algorithm on US Census datasets and compare its accuracy and debiasing performance, as measured by disparate impact and DF-$ε$ score, with similar group-fairness algorithms. Finally, we lay out important considerations users should be aware of before incorporating this algorithm into their application, and direct them to further reading on the pros, cons, and ethical implications of using statistical parity as a fairness criterion.

下载PDF全文

下载文献需遵守相关版权规定

论文标题