缺少数据插补对图节点分类器的公平性和准确性的影响

论文标题

缺少数据插补对图节点分类器的公平性和准确性的影响

Impact Of Missing Data Imputation On The Fairness And Accuracy Of Graph Node Classifiers

论文作者

Mansoor, Haris, Ali, Sarwan, Alam, Shafiq, Khan, Muhammad Asad, Hassan, Umair ul, Khan, Imdadullah

论文摘要

最近对机器学习（ML）算法的公平性分析最近吸引了许多研究人员的兴趣。大多数ML方法都表现出对受保护群体的偏见，这限制了ML模型在诸如犯罪率预测等许多应用中的适用性。由于数据可能缺少值，如果不适当处理，则已知会进一步影响公平性。提出了许多插补方法来处理丢失的数据。但是，丢失数据插补对公平性的影响并不是很好地研究。在本文中，我们使用不同的嵌入和神经网络方法分析了图形数据（节点属性）插补的上下文中对公平性的影响。在六个数据集上进行的广泛实验表明，在图源分类下缺少数据插补的严重公平问题。我们还发现，插补方法的选择会影响公平和准确性。我们的结果为图形数据公平性以及如何有效地处理图形的缺失提供了宝贵的见解。这项工作还提供了有关图形数据中公平性的理论研究的指示。

Analysis of the fairness of machine learning (ML) algorithms recently attracted many researchers' interest. Most ML methods show bias toward protected groups, which limits the applicability of ML models in many applications like crime rate prediction etc. Since the data may have missing values which, if not appropriately handled, are known to further harmfully affect fairness. Many imputation methods are proposed to deal with missing data. However, the effect of missing data imputation on fairness is not studied well. In this paper, we analyze the effect on fairness in the context of graph data (node attributes) imputation using different embedding and neural network methods. Extensive experiments on six datasets demonstrate severe fairness issues in missing data imputation under graph node classification. We also find that the choice of the imputation method affects both fairness and accuracy. Our results provide valuable insights into graph data fairness and how to handle missingness in graphs efficiently. This work also provides directions regarding theoretical studies on fairness in graph data.

下载PDF全文

下载文献需遵守相关版权规定

论文标题