论文标题
迈向改善对抗性鲁棒性的替代技术:在一系列扰动中对对抗训练的分析
Towards Alternative Techniques for Improving Adversarial Robustness: Analysis of Adversarial Training at a Spectrum of Perturbations
论文作者
论文摘要
对抗性训练(AT)及其变体在过去几年来改善对对抗性扰动和常见腐败的神经网络的鲁棒性方面取得了长足的进步。 AT及其变体的算法设计集中在指定的扰动强度$ε$的训练模型上,并且仅利用该$ε$ - 抛光模型的性能的反馈来改善算法。在这项工作中,我们专注于在$ε$值频谱的培训的模型上。我们分析了三种观点:模型性能,中间特征精度和卷积滤波器灵敏度。在每种情况下,我们都会确定以下$ε$的替代改进。具体来说,我们发现,对于以某种强度$δ$的PGD攻击,有一个型号以某种稍大的强度$ε$,但没有更大的范围,可以概括它。因此,我们提出过度设计的鲁棒性,我们建议以$ε$略高于$δ$的培训模型。其次,我们观察到(在各种$ε$值中),鲁棒性对中间特征的精度,尤其是在第一层和第二层之后的精度高度敏感。因此,我们建议在防御措施中添加简单的量化,以提高可见和看不见的自适应攻击的准确性。第三,我们分析了增加$ε$的每一层模型的卷积过滤器,并注意到第一和第二层的卷积过滤器可能完全负责放大输入扰动。我们通过在CIFAR-10和CIFAR-10-C数据集上使用Resnet和WideSnet模型进行实验,介绍我们的发现并证明我们的技术。
Adversarial training (AT) and its variants have spearheaded progress in improving neural network robustness to adversarial perturbations and common corruptions in the last few years. Algorithm design of AT and its variants are focused on training models at a specified perturbation strength $ε$ and only using the feedback from the performance of that $ε$-robust model to improve the algorithm. In this work, we focus on models, trained on a spectrum of $ε$ values. We analyze three perspectives: model performance, intermediate feature precision and convolution filter sensitivity. In each, we identify alternative improvements to AT that otherwise wouldn't have been apparent at a single $ε$. Specifically, we find that for a PGD attack at some strength $δ$, there is an AT model at some slightly larger strength $ε$, but no greater, that generalizes best to it. Hence, we propose overdesigning for robustness where we suggest training models at an $ε$ just above $δ$. Second, we observe (across various $ε$ values) that robustness is highly sensitive to the precision of intermediate features and particularly those after the first and second layer. Thus, we propose adding a simple quantization to defenses that improves accuracy on seen and unseen adaptive attacks. Third, we analyze convolution filters of each layer of models at increasing $ε$ and notice that those of the first and second layer may be solely responsible for amplifying input perturbations. We present our findings and demonstrate our techniques through experiments with ResNet and WideResNet models on the CIFAR-10 and CIFAR-10-C datasets.