跨表示空间的对抗性鲁棒性

论文标题

跨表示空间的对抗性鲁棒性

Adversarial Robustness Across Representation Spaces

论文作者

Awasthi, Pranjal, Yu, George, Ferng, Chun-Sung, Tomkins, Andrew, Juan, Da-Cheng

论文摘要

对抗性的鲁棒性对应于深神经网络对测试时间不可易于触发的敏感性。在图像任务的背景下，已经提出了许多算法，以使神经网络可与对输入像素的对抗性扰动进行稳健性。这些扰动通常以$ \ ell_p $ norm进行测量。但是，鲁棒性通常仅适用于用于训练的特定攻击。在这项工作中，我们扩展了上述设置，以考虑深度神经网络的训练问题，这些问题可以同时使在多个自然表示空间中应用的扰动中同时进行。对于图像数据的情况，示例包括标准像素表示以及离散余弦变换（DCT）的表示形式。我们设计了一种理论上声音算法，并为上述问题提供了正式的保证。此外，我们的保证也可以保证，而目标是需要相对于基于多个$ \ ell_p $ norm norm攻击的鲁棒性。然后，我们得出有效的实际实施，并证明了我们方法在标准数据集中进行图像分类的有效性。

Adversarial robustness corresponds to the susceptibility of deep neural networks to imperceptible perturbations made at test time. In the context of image tasks, many algorithms have been proposed to make neural networks robust to adversarial perturbations made to the input pixels. These perturbations are typically measured in an $\ell_p$ norm. However, robustness often holds only for the specific attack used for training. In this work we extend the above setting to consider the problem of training of deep neural networks that can be made simultaneously robust to perturbations applied in multiple natural representation spaces. For the case of image data, examples include the standard pixel representation as well as the representation in the discrete cosine transform~(DCT) basis. We design a theoretically sound algorithm with formal guarantees for the above problem. Furthermore, our guarantees also hold when the goal is to require robustness with respect to multiple $\ell_p$ norm based attacks. We then derive an efficient practical implementation and demonstrate the effectiveness of our approach on standard datasets for image classification.

下载PDF全文

下载文献需遵守相关版权规定

论文标题