可学习的边界指导的对抗训练

论文标题

可学习的边界指导的对抗训练

Learnable Boundary Guided Adversarial Training

论文作者

Cui, Jiequan, Liu, Shu, Wang, Liwei, Jia, Jiaya

论文摘要

以前的对手训练在自然数据的准确性妥协下提高了模型鲁棒性。在本文中，我们降低了自然准确性降解。考虑到来自训练有素的清洁模型的逻辑，我们使用一个清洁模型的模型逻辑来指导学习另一个强大的模型，这些模型嵌入了自然数据的最歧视性特征，{\ it e.g。}，可概括的分类器边界。我们的解决方案是从可靠的模型中限制逻辑，该模型将对抗性示例作为输入，并与用相应的自然数据喂养的清洁模型相似。它可以让鲁棒模型继承清洁模型的分类器边界。此外，我们观察到这种边界指导不仅可以保持高自然精度，而且还可以使模型的鲁棒性受益，这为对抗性社区提供了新的见解并促进了进步。最后，对CIFAR-10，CIFAR-100和TININE IMATENET的广泛实验证明了我们方法的有效性。我们在CIFAR-100上实现了新的最先进的鲁棒性，而无需使用自动攻击基准\ footNote {\ url {https://github.com/fra31/auto-attack}}}}}。我们的代码可在\ url {https://github.com/dvlab-research/lbgat}上找到。

Previous adversarial training raises model robustness under the compromise of accuracy on natural data. In this paper, we reduce natural accuracy degradation. We use the model logits from one clean model to guide learning of another one robust model, taking into consideration that logits from the well trained clean model embed the most discriminative features of natural data, {\it e.g.}, generalizable classifier boundary. Our solution is to constrain logits from the robust model that takes adversarial examples as input and makes it similar to those from the clean model fed with corresponding natural data. It lets the robust model inherit the classifier boundary of the clean model. Moreover, we observe such boundary guidance can not only preserve high natural accuracy but also benefit model robustness, which gives new insights and facilitates progress for the adversarial community. Finally, extensive experiments on CIFAR-10, CIFAR-100, and Tiny ImageNet testify to the effectiveness of our method. We achieve new state-of-the-art robustness on CIFAR-100 without additional real or synthetic data with auto-attack benchmark \footnote{\url{https://github.com/fra31/auto-attack}}. Our code is available at \url{https://github.com/dvlab-research/LBGAT}.

下载PDF全文

下载文献需遵守相关版权规定

论文标题