稳定神经台的对抗性鲁棒性可能来自混淆的梯度

论文标题

稳定神经台的对抗性鲁棒性可能来自混淆的梯度

Adversarial Robustness of Stabilized NeuralODEs Might be from Obfuscated Gradients

论文作者

Huang, Yifei, Yu, Yaodong, Zhang, Hongyang, Ma, Yi, Yao, Yuan

论文摘要

在本文中，我们引入了一种可证明的稳定的架构，用于神经常规微分方程（ODE），即使网络自然训练，在白盒对抗性攻击下，在白盒对抗性攻击下实现了非平凡的对抗性鲁棒性。对于大多数现有的防御方法，都会遭受强烈的白盒攻击，以提高神经网络的鲁棒性，需要对对手进行训练，因此必须在自然准确性和对抗性鲁棒性之间进行权衡。受动力学系统理论的启发，我们设计了一个稳定的神经ODE网络，该网络名为SONET，其ode块是偏斜的，并被证明是输入输出稳定的。通过自然训练，SONET可以通过最先进的对抗防御方法实现可比的鲁棒性，而无需牺牲自然的准确性。 Even replacing only the first layer of a ResNet by such a ODE block can exhibit further improvement in robustness, e.g., under PGD-20 ($\ell_\infty=0.031$) attack on CIFAR-10 dataset, it achieves 91.57\% and natural accuracy and 62.35\% robust accuracy, while a counterpart architecture of ResNet trained with TRADES achieves natural and鲁棒精度分别为76.29 \％和45.24 \％。为了理解这一令人惊讶的好结果的可能原因，我们进一步探讨了这种对抗性鲁棒性的可能机制。我们表明，自适应步骤尺寸数值求解器DOPRI5具有梯度掩盖效果，使PGD攻击失败，对训练损失的梯度信息敏感；另一方面，它不能欺骗强大梯度的CW攻击和无梯度的SPSA攻击。这提供了一个新的解释，即基于ODE的网络的对抗性鲁棒性主要来自数值求解器中混淆的梯度。

In this paper we introduce a provably stable architecture for Neural Ordinary Differential Equations (ODEs) which achieves non-trivial adversarial robustness under white-box adversarial attacks even when the network is trained naturally. For most existing defense methods withstanding strong white-box attacks, to improve robustness of neural networks, they need to be trained adversarially, hence have to strike a trade-off between natural accuracy and adversarial robustness. Inspired by dynamical system theory, we design a stabilized neural ODE network named SONet whose ODE blocks are skew-symmetric and proved to be input-output stable. With natural training, SONet can achieve comparable robustness with the state-of-the-art adversarial defense methods, without sacrificing natural accuracy. Even replacing only the first layer of a ResNet by such a ODE block can exhibit further improvement in robustness, e.g., under PGD-20 ($\ell_\infty=0.031$) attack on CIFAR-10 dataset, it achieves 91.57\% and natural accuracy and 62.35\% robust accuracy, while a counterpart architecture of ResNet trained with TRADES achieves natural and robust accuracy 76.29\% and 45.24\%, respectively. To understand possible reasons behind this surprisingly good result, we further explore the possible mechanism underlying such an adversarial robustness. We show that the adaptive stepsize numerical ODE solver, DOPRI5, has a gradient masking effect that fails the PGD attacks which are sensitive to gradient information of training loss; on the other hand, it cannot fool the CW attack of robust gradients and the SPSA attack that is gradient-free. This provides a new explanation that the adversarial robustness of ODE-based networks mainly comes from the obfuscated gradients in numerical ODE solvers.

下载PDF全文

下载文献需遵守相关版权规定

论文标题