论文标题
在团体公平限制下最大化集合的幸福(技术报告)
Happiness Maximizing Sets under Group Fairness Constraints (Technical Report)
论文作者
论文摘要
从数据库中找到最大化集合(HMS)的幸福感,即选择一小部分元组,以保持相对于任何非负线线性效用函数,这是多标准决策的重要问题。当从一组个人中提取HMS以协助数据驱动的算法决定(例如招聘和入院)时,至关重要的是要确保HMS可以公平地代表没有偏见和歧视的不同候选人。但是,尽管HMS问题在数据库社区中进行了广泛的研究,但现有算法并未考虑组公平性,并且可能会提供一些不足的解决方案。 在本文中,我们提出并调查了HMS(Fairhms)的公平变体,该变体不仅可以最大化最小的幸福比,而且还可以确保从每个组中选择的元素数量属于预定义的下限和上限。与Vanilla HMS问题相似,我们表明Fairhms在三个及更高的维度上是NP-HARD。因此,我们首先提出了一种基于二维数据库中Fairhms的基于INTCOV的基于INTCOV的精确覆盖算法。然后,我们提出了一种在多维数据库上称为Fairhms的两尺度近似算法,通过将其转换为Matroid约束下的supsodular最大化问题。我们还设计了一种自适应抽样策略,以提高Bigreedy的实际效率。关于现实世界和合成数据集的广泛实验证实了我们建议的效率和效率。
Finding a happiness maximizing set (HMS) from a database, i.e., selecting a small subset of tuples that preserves the best score with respect to any nonnegative linear utility function, is an important problem in multi-criteria decision-making. When an HMS is extracted from a set of individuals to assist data-driven algorithmic decisions such as hiring and admission, it is crucial to ensure that the HMS can fairly represent different groups of candidates without bias and discrimination. However, although the HMS problem was extensively studied in the database community, existing algorithms do not take group fairness into account and may provide solutions that under-represent some groups. In this paper, we propose and investigate a fair variant of HMS (FairHMS) that not only maximizes the minimum happiness ratio but also guarantees that the number of tuples chosen from each group falls within predefined lower and upper bounds. Similar to the vanilla HMS problem, we show that FairHMS is NP-hard in three and higher dimensions. Therefore, we first propose an exact interval cover-based algorithm called IntCov for FairHMS on two-dimensional databases. Then, we propose a bicriteria approximation algorithm called BiGreedy for FairHMS on multi-dimensional databases by transforming it into a submodular maximization problem under a matroid constraint. We also design an adaptive sampling strategy to improve the practical efficiency of BiGreedy. Extensive experiments on real-world and synthetic datasets confirm the efficacy and efficiency of our proposal.