公平意识的对抗性扰动对部署的深层模型的偏见缓解

论文标题

公平意识的对抗性扰动对部署的深层模型的偏见缓解

Fairness-aware Adversarial Perturbation Towards Bias Mitigation for Deployed Deep Models

论文作者

Wang, Zhibo, Dong, Xiaowei, Xue, Henry, Zhang, Zhifei, Chiu, Weifeng, Wei, Tao, Ren, Kui

论文摘要

优先考虑公平性在人工智能（AI）系统中至关重要，尤其是对于那些社会应用，例如，招聘系统应同样向不同的人群群体推荐申请人，风险评估系统必须消除刑事司法中的种族主义。现有的AI系统道德发展努力已利用数据科学来减轻培训集中的偏见，或者将公平原则引入培训过程。但是，对于部署的AI系统，它可能不允许在实践中进行重新训练或调整。相比之下，我们提出了一种更灵活的方法，即公平意识的对抗性扰动（FAAP），该方法学会了将输入数据驱动到与公平相关特征（例如性别和种族）的盲目部署模型。关键优势是FAAP不会根据参数和结构修改已部署的模型。为了实现这一目标，我们设计了一个歧视器，以根据潜在表示与已部署的模型区分公平相关的属性。同时，针对鉴别器训练了扰动发生器，因此不能从扰动输入中提取与公平相关的功能。详尽的实验评估证明了所提出的FAAP的有效性和卓越性能。此外，FAAP在现实世界的商业部署（模型参数无法访问）上得到了验证，该FAAP显示了FAAP的可传递性，预见了黑盒适应性的潜力。

Prioritizing fairness is of central importance in artificial intelligence (AI) systems, especially for those societal applications, e.g., hiring systems should recommend applicants equally from different demographic groups, and risk assessment systems must eliminate racism in criminal justice. Existing efforts towards the ethical development of AI systems have leveraged data science to mitigate biases in the training set or introduced fairness principles into the training process. For a deployed AI system, however, it may not allow for retraining or tuning in practice. By contrast, we propose a more flexible approach, i.e., fairness-aware adversarial perturbation (FAAP), which learns to perturb input data to blind deployed models on fairness-related features, e.g., gender and ethnicity. The key advantage is that FAAP does not modify deployed models in terms of parameters and structures. To achieve this, we design a discriminator to distinguish fairness-related attributes based on latent representations from deployed models. Meanwhile, a perturbation generator is trained against the discriminator, such that no fairness-related features could be extracted from perturbed inputs. Exhaustive experimental evaluation demonstrates the effectiveness and superior performance of the proposed FAAP. In addition, FAAP is validated on real-world commercial deployments (inaccessible to model parameters), which shows the transferability of FAAP, foreseeing the potential of black-box adaptation.

下载PDF全文

下载文献需遵守相关版权规定

论文标题