论文标题
学习攻击:在现实情况下朝着文本对抗性攻击
Learning to Attack: Towards Textual Adversarial Attacking in Real-world Situations
论文作者
论文摘要
对抗性攻击的目的是用对抗性例子欺骗深层的神经网络。在自然语言处理领域,已经提出了各种文本对抗攻击模型,在对受害者模型的可访问性方面有所不同。其中,仅需要受害者模型输出的攻击模型更适合对抗攻击的现实情况。但是,为了实现高攻击性能,这些模型通常需要对受害者模型进行过多查询,这在实践中既没有效率也不可行。为了解决这个问题,我们提出了一个基于加强学习的攻击模型,该模型可以从攻击历史中学习并更有效地发射攻击。在实验中,我们通过在多个任务的基准数据集上攻击多个最先进的模型来评估我们的模型,包括情感分析,文本分类和自然语言推断。实验结果表明,与最近提出的基线方法相比,我们的模型始终达到更好的攻击性能和更高的效率。我们还发现,我们的攻击模型可以通过对抗性训练为受害者模型带来更大的鲁棒性改善。本文的所有代码和数据都将公开。
Adversarial attacking aims to fool deep neural networks with adversarial examples. In the field of natural language processing, various textual adversarial attack models have been proposed, varying in the accessibility to the victim model. Among them, the attack models that only require the output of the victim model are more fit for real-world situations of adversarial attacking. However, to achieve high attack performance, these models usually need to query the victim model too many times, which is neither efficient nor viable in practice. To tackle this problem, we propose a reinforcement learning based attack model, which can learn from attack history and launch attacks more efficiently. In experiments, we evaluate our model by attacking several state-of-the-art models on the benchmark datasets of multiple tasks including sentiment analysis, text classification and natural language inference. Experimental results demonstrate that our model consistently achieves both better attack performance and higher efficiency than recently proposed baseline methods. We also find our attack model can bring more robustness improvement to the victim model by adversarial training. All the code and data of this paper will be made public.