论文标题
双重歧管对抗性鲁棒性:防御有限责任和非LP对抗攻击
Dual Manifold Adversarial Robustness: Defense against Lp and non-Lp Adversarial Attacks
论文作者
论文摘要
对抗训练是针对具有有限LP规范的攻击威胁模型的流行防御策略。但是,它通常会在正常图像上降低模型性能,并且防御并不能很好地推广到新颖的攻击中。鉴于诸如gan和vaes之类的深层生成模型在表征图像的基本流形方面取得了成功,我们研究上述问题是否可以通过利用基本的歧管信息来补救。为此,我们通过将ImageNet样本投影到StyleGSN学到的流形上,构建一个“ on-manifold Imagenet”(OM-Imagenet)数据集。对于此数据集,确切的基础歧管信息是准确的。使用OM-Imagenet,我们首先表明图像潜在空间中的对抗训练可提高标准准确性和稳健性,对人体攻击。但是,由于没有实现不持续的扰动,因此防御能力可以通过LP对抗性攻击而打破。我们进一步提出了双重歧管对抗训练(DMAT),其中潜在和图像空间中的对抗扰动都用于鲁棒化模型。我们的DMAT提高了正常图像的性能,并与针对LP攻击的标准对抗训练达到了可比的鲁棒性。此外,我们观察到,由DMAT防御的模型可以改善对新型攻击的鲁棒性,这些攻击通过全局颜色移动或各种类型的图像过滤来操纵图像。有趣的是,当对卫冕模型进行测试时,也可以在超模型的自然图像上测试类似的改进。这些结果表明,使用流形信息在增强深度学习模型对各种新型对抗性攻击的鲁棒性方面的潜在好处。
Adversarial training is a popular defense strategy against attack threat models with bounded Lp norms. However, it often degrades the model performance on normal images and the defense does not generalize well to novel attacks. Given the success of deep generative models such as GANs and VAEs in characterizing the underlying manifold of images, we investigate whether or not the aforementioned problems can be remedied by exploiting the underlying manifold information. To this end, we construct an "On-Manifold ImageNet" (OM-ImageNet) dataset by projecting the ImageNet samples onto the manifold learned by StyleGSN. For this dataset, the underlying manifold information is exact. Using OM-ImageNet, we first show that adversarial training in the latent space of images improves both standard accuracy and robustness to on-manifold attacks. However, since no out-of-manifold perturbations are realized, the defense can be broken by Lp adversarial attacks. We further propose Dual Manifold Adversarial Training (DMAT) where adversarial perturbations in both latent and image spaces are used in robustifying the model. Our DMAT improves performance on normal images, and achieves comparable robustness to the standard adversarial training against Lp attacks. In addition, we observe that models defended by DMAT achieve improved robustness against novel attacks which manipulate images by global color shifts or various types of image filtering. Interestingly, similar improvements are also achieved when the defended models are tested on out-of-manifold natural images. These results demonstrate the potential benefits of using manifold information in enhancing robustness of deep learning models against various types of novel adversarial attacks.