论文标题
破坏深层:针对条件图像翻译网络和面部操纵系统的对抗性攻击
Disrupting Deepfakes: Adversarial Attacks Against Conditional Image Translation Networks and Facial Manipulation Systems
论文作者
论文摘要
使用深度学习的面部修改系统已经变得越来越强大且易于使用。给定一个人脸的图像,这种系统可以在不同表达式和姿势下生成同一个人的新图像。一些系统还可以修改目标属性,例如头发颜色或年龄。这种操纵的图像和视频已被创造了深击。为了防止恶意用户未经同意而生成一个人的修改图像,我们解决了针对此类图像翻译系统产生对抗性攻击的新问题,这破坏了所得的输出图像。我们称这个问题破坏了深层。大多数图像翻译体系结构都是以属性为条件的生成模型(例如,在此人的脸上露出微笑)。我们首先提出并成功地应用(1)推广到不同类别的可转移的对抗性攻击,这意味着攻击者不需要了解条件类别的知识,以及(2)对生成对抗网络(GAN)的对抗性培训,作为朝着强大的图像转换网络迈出的第一步。最后,在灰色盒子的情况下,模糊可以成功防止破坏。我们提出了传播对抗性攻击,该攻击逃避了模糊的防御力。我们的开源代码可以在https://github.com/natanielruiz/disrupting-deepfakes上找到。
Face modification systems using deep learning have become increasingly powerful and accessible. Given images of a person's face, such systems can generate new images of that same person under different expressions and poses. Some systems can also modify targeted attributes such as hair color or age. This type of manipulated images and video have been coined Deepfakes. In order to prevent a malicious user from generating modified images of a person without their consent we tackle the new problem of generating adversarial attacks against such image translation systems, which disrupt the resulting output image. We call this problem disrupting deepfakes. Most image translation architectures are generative models conditioned on an attribute (e.g. put a smile on this person's face). We are first to propose and successfully apply (1) class transferable adversarial attacks that generalize to different classes, which means that the attacker does not need to have knowledge about the conditioning class, and (2) adversarial training for generative adversarial networks (GANs) as a first step towards robust image translation networks. Finally, in gray-box scenarios, blurring can mount a successful defense against disruption. We present a spread-spectrum adversarial attack, which evades blur defenses. Our open-source code can be found at https://github.com/natanielruiz/disrupting-deepfakes.