为自然语言产生流利的对抗示例

论文标题

为自然语言产生流利的对抗示例

Generating Fluent Adversarial Examples for Natural Languages

论文作者

Zhang, Huangzhao, Zhou, Hao, Miao, Ning, Li, Lei

论文摘要

有效地为自然语言处理（NLP）任务建立对抗性攻击者是一个真正的挑战。首先，由于句子空间是离散的，因此很难沿着梯度方向进行小扰动。其次，无法保证生成的示例的流利度。在本文中，我们提出了MHA，该MHA通过执行大都市 - 悬挂抽样来解决这两个问题，该采样的建议是在梯度的指导下设计的。 IMDB和SNLI的实验表明，我们提出的MHA在攻击能力方面的表现优于基线模型。对MAH进行的对抗训练也可以提高更好的鲁棒性和性能。

Efficiently building an adversarial attacker for natural language processing (NLP) tasks is a real challenge. Firstly, as the sentence space is discrete, it is difficult to make small perturbations along the direction of gradients. Secondly, the fluency of the generated examples cannot be guaranteed. In this paper, we propose MHA, which addresses both problems by performing Metropolis-Hastings sampling, whose proposal is designed with the guidance of gradients. Experiments on IMDB and SNLI show that our proposed MHA outperforms the baseline model on attacking capability. Adversarial training with MAH also leads to better robustness and performance.

下载PDF全文

下载文献需遵守相关版权规定

论文标题