论文标题
增强对对抗性鲁棒性评估的信心
Increasing Confidence in Adversarial Robustness Evaluations
论文作者
论文摘要
已经提出了数百种防御能力,以使深层神经网络可靠地抵抗最小(对抗)输入扰动。但是,只有少数这些防御能力提高了他们的主张,因为正确评估鲁棒性是极具挑战性的:即使在不知不觉中存在,弱攻击也常常无法找到对抗性示例,从而使脆弱的网络看起来强大。在本文中,我们提出了一项测试,以识别较弱的攻击,从而识别弱国防评估。我们的测试稍微修改了神经网络,以确保每个样本的对抗示例存在。因此,任何正确的攻击都必须成功打破此修改后的网络。在13个先前出版的防御措施中,有11个,对防御的最初评估未能通过我们的测试,而打破这些防御的更强烈攻击使它通过了。我们希望攻击单元测试(例如我们的)将成为未来鲁棒性评估的主要组成部分,并增加对当前怀有怀疑主义的经验领域的信心。
Hundreds of defenses have been proposed to make deep neural networks robust against minimal (adversarial) input perturbations. However, only a handful of these defenses held up their claims because correctly evaluating robustness is extremely challenging: Weak attacks often fail to find adversarial examples even if they unknowingly exist, thereby making a vulnerable network look robust. In this paper, we propose a test to identify weak attacks, and thus weak defense evaluations. Our test slightly modifies a neural network to guarantee the existence of an adversarial example for every sample. Consequentially, any correct attack must succeed in breaking this modified network. For eleven out of thirteen previously-published defenses, the original evaluation of the defense fails our test, while stronger attacks that break these defenses pass it. We hope that attack unit tests - such as ours - will be a major component in future robustness evaluations and increase confidence in an empirical field that is currently riddled with skepticism.