显着性攻击：朝着不可察觉的黑盒对抗攻击

论文标题

显着性攻击：朝着不可察觉的黑盒对抗攻击

Saliency Attack: Towards Imperceptible Black-box Adversarial Attack

论文作者

Dai, Zeyu, Liu, Shengcai, Tang, Ke, Li, Qing

论文摘要

深度神经网络很容易受到对抗性示例的影响，即使在攻击者仅在模型输出中访问的黑框设置中。最近的研究已经设计了具有高查询效率的有效黑盒攻击。但是，这种表现通常伴随着攻击不可识别的妥协，阻碍了这些方法的实际使用。在本文中，我们建议将扰动限制在一个小的显着区域，以产生几乎无法理解的对抗例子。这种方法很容易与许多现有的黑盒攻击兼容，并且可以显着提高其不可识别的性能，而攻击成功率几乎没有降解。此外，我们提出了显着性攻击，这是一种新的黑盒攻击，旨在完善显着区域的扰动，以达到更好的不可识别。广泛的实验表明，与最先进的黑盒攻击相比，我们的方法取得了更好的不可识别得分，包括最明显的失真（疯狂），$ l_0 $ $ l_0 $和$ l_2 $距离，并且还获得了更高的成功率，这在疯狂的MAD上以类似人类的阈值来判断。重要的是，我们的方法产生的扰动在某种程度上是可以解释的。最后，它也证明对基于不同检测的防御能力是可靠的。

Deep neural networks are vulnerable to adversarial examples, even in the black-box setting where the attacker is only accessible to the model output. Recent studies have devised effective black-box attacks with high query efficiency. However, such performance is often accompanied by compromises in attack imperceptibility, hindering the practical use of these approaches. In this paper, we propose to restrict the perturbations to a small salient region to generate adversarial examples that can hardly be perceived. This approach is readily compatible with many existing black-box attacks and can significantly improve their imperceptibility with little degradation in attack success rate. Further, we propose the Saliency Attack, a new black-box attack aiming to refine the perturbations in the salient region to achieve even better imperceptibility. Extensive experiments show that compared to the state-of-the-art black-box attacks, our approach achieves much better imperceptibility scores, including most apparent distortion (MAD), $L_0$ and $L_2$ distances, and also obtains significantly higher success rates judged by a human-like threshold on MAD. Importantly, the perturbations generated by our approach are interpretable to some extent. Finally, it is also demonstrated to be robust to different detection-based defenses.

下载PDF全文

下载文献需遵守相关版权规定

论文标题