评估中国BERT分类器的鲁棒性

论文标题

评估中国BERT分类器的鲁棒性

Towards Evaluating the Robustness of Chinese BERT Classifiers

论文作者

Wang, Boxin, Pan, Boyuan, Li, Xin, Li, Bo

论文摘要

诸如BERT之类的大规模语言表示模型的最新进展改善了许多NLP任务中的最新性能。同时，包括中文的BERT在内的角色级中文NLP模型也表明它们可以胜过现有模型。在本文中，我们表明，这种基于BERT的模型在角色级别的对抗攻击下很容易受到伤害。我们提出了一种针对基于BERT的分类器的新型中国char级攻击方法。从本质上讲，我们在嵌入空间中的字符级别上产生“小”扰动，并指导角色替代程序。广泛的实验表明，根据拟议的攻击，平均操纵小于2个字符，中国新闻数据集的分类精度从91.8％下降到0％。人类评估还证实，我们产生的中国对抗例子几乎不会影响人类在这些NLP任务上的表现。

Recent advances in large-scale language representation models such as BERT have improved the state-of-the-art performances in many NLP tasks. Meanwhile, character-level Chinese NLP models, including BERT for Chinese, have also demonstrated that they can outperform the existing models. In this paper, we show that, however, such BERT-based models are vulnerable under character-level adversarial attacks. We propose a novel Chinese char-level attack method against BERT-based classifiers. Essentially, we generate "small" perturbation on the character level in the embedding space and guide the character substitution procedure. Extensive experiments show that the classification accuracy on a Chinese news dataset drops from 91.8% to 0% by manipulating less than 2 characters on average based on the proposed attack. Human evaluations also confirm that our generated Chinese adversarial examples barely affect human performance on these NLP tasks.

下载PDF全文

下载文献需遵守相关版权规定

论文标题