论文标题
深入了解数据集的不平衡和面部识别的偏见
A Deep Dive into Dataset Imbalance and Bias in Face Identification
论文作者
论文摘要
随着自动面部识别系统(FR)系统的部署激增,这些系统的偏见不仅是一个学术问题,而且是公众关注的问题。媒体刻画通常将失衡作为偏见的主要来源,即FR模型在非白人或女性的图像上的表现较差,因为这些人口群体在培训数据中的代表性不足。最近的学术研究描绘了这种关系的更细微的图片。但是,以前对FR数据失衡的研究仅专注于面部验证设置,而面部识别设置在很大程度上被忽略了,尽管被部署在诸如执法范围之类的敏感应用中。这是一个不幸的遗漏,因为“失衡”是识别中更复杂的问题。不仅在训练数据中,而且还会出现测试数据,而且可能会影响属于每个人口组的身份的比例或属于每个身份的图像数量的比例。在这项工作中,我们通过彻底探讨了面部识别中每种失衡的影响,并讨论可能影响这种情况下偏见的其他因素,从而解决了研究中的这一差距。
As the deployment of automated face recognition (FR) systems proliferates, bias in these systems is not just an academic question, but a matter of public concern. Media portrayals often center imbalance as the main source of bias, i.e., that FR models perform worse on images of non-white people or women because these demographic groups are underrepresented in training data. Recent academic research paints a more nuanced picture of this relationship. However, previous studies of data imbalance in FR have focused exclusively on the face verification setting, while the face identification setting has been largely ignored, despite being deployed in sensitive applications such as law enforcement. This is an unfortunate omission, as 'imbalance' is a more complex matter in identification; imbalance may arise in not only the training data, but also the testing data, and furthermore may affect the proportion of identities belonging to each demographic group or the number of images belonging to each identity. In this work, we address this gap in the research by thoroughly exploring the effects of each kind of imbalance possible in face identification, and discuss other factors which may impact bias in this setting.