增强对对抗性鲁棒性评估的信心

论文标题

增强对对抗性鲁棒性评估的信心

Increasing Confidence in Adversarial Robustness Evaluations

论文作者

Zimmermann, Roland S., Brendel, Wieland, Tramer, Florian, Carlini, Nicholas

论文摘要

已经提出了数百种防御能力，以使深层神经网络可靠地抵抗最小（对抗）输入扰动。但是，只有少数这些防御能力提高了他们的主张，因为正确评估鲁棒性是极具挑战性的：即使在不知不觉中存在，弱攻击也常常无法找到对抗性示例，从而使脆弱的网络看起来强大。在本文中，我们提出了一项测试，以识别较弱的攻击，从而识别弱国防评估。我们的测试稍微修改了神经网络，以确保每个样本的对抗示例存在。因此，任何正确的攻击都必须成功打破此修改后的网络。在13个先前出版的防御措施中，有11个，对防御的最初评估未能通过我们的测试，而打破这些防御的更强烈攻击使它通过了。我们希望攻击单元测试（例如我们的）将成为未来鲁棒性评估的主要组成部分，并增加对当前怀有怀疑主义的经验领域的信心。

Hundreds of defenses have been proposed to make deep neural networks robust against minimal (adversarial) input perturbations. However, only a handful of these defenses held up their claims because correctly evaluating robustness is extremely challenging: Weak attacks often fail to find adversarial examples even if they unknowingly exist, thereby making a vulnerable network look robust. In this paper, we propose a test to identify weak attacks, and thus weak defense evaluations. Our test slightly modifies a neural network to guarantee the existence of an adversarial example for every sample. Consequentially, any correct attack must succeed in breaking this modified network. For eleven out of thirteen previously-published defenses, the original evaluation of the defense fails our test, while stronger attacks that break these defenses pass it. We hope that attack unit tests - such as ours - will be a major component in future robustness evaluations and increase confidence in an empirical field that is currently riddled with skepticism.

下载PDF全文

下载文献需遵守相关版权规定

论文标题