TCAB：大规模的文本分类攻击基准

论文标题

TCAB：大规模的文本分类攻击基准

TCAB: A Large-Scale Text Classification Attack Benchmark

论文作者

Asthana, Kalyani, Xie, Zhouhang, You, Wencong, Noack, Adam, Brophy, Jonathan, Singh, Sameer, Lowd, Daniel

论文摘要

我们介绍了文本分类攻击基准（TCAB），该数据集用于分析，理解，检测和标记针对文本分类器的对抗性攻击。 TCAB包括150万个攻击实例，该实例由十二次对抗攻击产生，针对三个在六个源数据集中培训的分类器，以用英语进行情感分析和滥用检测。与标准文本分类不同，必须在正在攻击的目标分类器的上下文中理解文本攻击，因此目标分类器的功能也很重要。 TCAB包括成功翻转预测标签的所有攻击实例；人类注释者还标记了攻击的子集，以确定保留主要语义的频率。生成攻击的过程是自动化的，因此可以轻松地扩展TCAB，以在开发过程中纳入新的文本攻击和更好的分类器。除了检测和标记攻击的主要任务外，TCAB还可以用于攻击定位，攻击目标标签和攻击表征。 TCAB代码和数据集可在https://react-nlp.github.io/tcab/上找到。

We introduce the Text Classification Attack Benchmark (TCAB), a dataset for analyzing, understanding, detecting, and labeling adversarial attacks against text classifiers. TCAB includes 1.5 million attack instances, generated by twelve adversarial attacks targeting three classifiers trained on six source datasets for sentiment analysis and abuse detection in English. Unlike standard text classification, text attacks must be understood in the context of the target classifier that is being attacked, and thus features of the target classifier are important as well. TCAB includes all attack instances that are successful in flipping the predicted label; a subset of the attacks are also labeled by human annotators to determine how frequently the primary semantics are preserved. The process of generating attacks is automated, so that TCAB can easily be extended to incorporate new text attacks and better classifiers as they are developed. In addition to the primary tasks of detecting and labeling attacks, TCAB can also be used for attack localization, attack target labeling, and attack characterization. TCAB code and dataset are available at https://react-nlp.github.io/tcab/.

下载PDF全文

下载文献需遵守相关版权规定

论文标题