集群中的代表性公平

论文标题

集群中的代表性公平

Representativity Fairness in Clustering

论文作者

P, Deepak, Abraham, Savitha Sam

论文摘要

将公平结构纳入机器学习算法是一个社会重要性和最近兴趣的主题。聚类是无监督学习中的一项基本任务，在许多网络数据方案中表现出来，在公平的ML研究中也受到关注。在本文中，我们在聚类中发展了一个新颖的公平概念，称为代表性公平。代表性公平是由于需要减轻对象跨物体代表代表的差异的必要性，以帮助更公平的决策。我们说明了代表性公平性在现实世界决策制定场景中的重要性，涉及聚类，并提供量化对象对其的代表性和公平性的方法。我们开发了一种新的聚类公式RFKM，该公式针对代表性公平和聚类质量进行优化。受$ k $ -Means框架的启发，RFKM结合了新颖的损失条款以制定目标功能。 RFKM目标和优化方法将其指向聚类配置，从而产生更高的代表性公平。通过对各种公共数据集的经验评估，我们确定了方法的有效性。我们说明，我们能够仅在边际影响到聚类质量的情况下显着提高代表性公平。

Incorporating fairness constructs into machine learning algorithms is a topic of much societal importance and recent interest. Clustering, a fundamental task in unsupervised learning that manifests across a number of web data scenarios, has also been subject of attention within fair ML research. In this paper, we develop a novel notion of fairness in clustering, called representativity fairness. Representativity fairness is motivated by the need to alleviate disparity across objects' proximity to their assigned cluster representatives, to aid fairer decision making. We illustrate the importance of representativity fairness in real-world decision making scenarios involving clustering and provide ways of quantifying objects' representativity and fairness over it. We develop a new clustering formulation, RFKM, that targets to optimize for representativity fairness along with clustering quality. Inspired by the $K$-Means framework, RFKM incorporates novel loss terms to formulate an objective function. The RFKM objective and optimization approach guides it towards clustering configurations that yield higher representativity fairness. Through an empirical evaluation over a variety of public datasets, we establish the effectiveness of our method. We illustrate that we are able to significantly improve representativity fairness at only marginal impact to clustering quality.

下载PDF全文

下载文献需遵守相关版权规定

论文标题