论文标题
通过更多覆盖范围更好的鲁棒性:对抗训练,并进行混合良好的微调
Better Robustness by More Coverage: Adversarial Training with Mixup Augmentation for Robust Fine-tuning
论文作者
论文摘要
在对抗性攻击下,审计的语言模型(PLM)的表现不佳。为了改善对抗性的鲁棒性,对对抗数据的增强(ADA)已被广泛采用,以通过在训练过程中添加文本对抗性示例来涵盖更多的对抗性攻击搜索空间。但是,由于大型攻击搜索空间,用于文本增强的对抗性示例的数量仍然非常不足。在这项工作中,我们提出了一种简单有效的方法,以涵盖更大比例的攻击搜索空间,称为对抗和混合数据增强(AMDA)。具体而言,AMDA线性插值将成对的训练样本的表示形成形成新的虚拟样品,这些样本比常规ADA中的离散文本对抗示例更丰富和多样化。此外,为了公平地评估不同模型的鲁棒性,我们采用了一个具有挑战性的评估设置,该设置生成了针对每个模型的新的对抗性示例。在Bert和Roberta的文本分类实验中,AMDA在两次强烈的对抗攻击下实现了巨大的鲁棒性增长,并减轻了ADA在干净数据上的绩效降低。我们的代码可在以下网址提供:https://github.com/thunlp/mixada。
Pretrained language models (PLMs) perform poorly under adversarial attacks. To improve the adversarial robustness, adversarial data augmentation (ADA) has been widely adopted to cover more search space of adversarial attacks by adding textual adversarial examples during training. However, the number of adversarial examples for text augmentation is still extremely insufficient due to the exponentially large attack search space. In this work, we propose a simple and effective method to cover a much larger proportion of the attack search space, called Adversarial and Mixup Data Augmentation (AMDA). Specifically, AMDA linearly interpolates the representations of pairs of training samples to form new virtual samples, which are more abundant and diverse than the discrete text adversarial examples in conventional ADA. Moreover, to fairly evaluate the robustness of different models, we adopt a challenging evaluation setup, which generates a new set of adversarial examples targeting each model. In text classification experiments of BERT and RoBERTa, AMDA achieves significant robustness gains under two strong adversarial attacks and alleviates the performance degradation of ADA on the clean data. Our code is available at: https://github.com/thunlp/MixADA .