我以前没见过你吗？评估合成虹膜中的身份泄漏

论文标题

我以前没见过你吗？评估合成虹膜中的身份泄漏

Haven't I Seen You Before? Assessing Identity Leakage in Synthetic Irises

论文作者

Tinsley, Patrick, Czajka, Adam, Flynn, Patrick

论文摘要

事实证明，生成的对抗网络（GAN）是综合对象（例如面，动物和汽车）的伪造图像的首选方法。毫不奇怪，这些模型还可以生成符合ISO兼容但合成的IRIS图像，这些图像可用于增强虹膜匹配器和LIVES探测器的训练数据。在这项工作中，我们培训了最新的GAN模型之一（StyleGAN3），以生成具有两个主要目标的假虹膜图像：（i）了解Gan生产“从未见过的”虹膜的能力，以及（ii）研究身份泄漏的现象，作为GAN训练时间的函数。先前的工作表明，个人生物识别数据可以无意间从培训数据流向综合样本，从而对意外出现在培训数据集中的受试者提出了隐私问题。本文在GAN训练过程中对三个不同的IRIS匹配器进行了分析，以诊断何时何地，何时和何时在生成过程中危害泄漏。我们的结果表明，虽然大多数合成样本没有显示出身份泄漏的迹象，但几乎完美地匹配了少数生成的样品（训练）样品，并且在所有匹配器中达成共识。为了确定对机器学习模型开发过程的隐私，安全性和信任的优先级，研究界必须在使用合成数据的好处与对潜在身份泄漏的隐私的相应威胁之间取得了微妙的平衡。

Generative Adversarial Networks (GANs) have proven to be a preferred method of synthesizing fake images of objects, such as faces, animals, and automobiles. It is not surprising these models can also generate ISO-compliant, yet synthetic iris images, which can be used to augment training data for iris matchers and liveness detectors. In this work, we trained one of the most recent GAN models (StyleGAN3) to generate fake iris images with two primary goals: (i) to understand the GAN's ability to produce "never-before-seen" irises, and (ii) to investigate the phenomenon of identity leakage as a function of the GAN's training time. Previous work has shown that personal biometric data can inadvertently flow from training data into synthetic samples, raising a privacy concern for subjects who accidentally appear in the training dataset. This paper presents analysis for three different iris matchers at varying points in the GAN training process to diagnose where and when authentic training samples are in jeopardy of leaking through the generative process. Our results show that while most synthetic samples do not show signs of identity leakage, a handful of generated samples match authentic (training) samples nearly perfectly, with consensus across all matchers. In order to prioritize privacy, security, and trust in the machine learning model development process, the research community must strike a delicate balance between the benefits of using synthetic data and the corresponding threats against privacy from potential identity leakage.

下载PDF全文

下载文献需遵守相关版权规定

论文标题