论文标题

数据驱动的对抗文本扰动的缓解

Data-Driven Mitigation of Adversarial Text Perturbation

论文作者

Bhalerao, Rasika, Al-Rubaie, Mohammad, Bhaskar, Anand, Markov, Igor

论文摘要

社交网络已成为我们生活中必不可少的一部分,数十亿人产生了越来越多的文字。在这样的尺度上,内容策略及其执行变得至关重要。为了自动化,自然语言处理(NLP)分类器检测到可疑内容。但是,高性能分类器受到拼写错误和对抗文本扰动的阻碍。在本文中,我们将有意和无意的对抗文本扰动分为十种类型,并提出了DeobFuscation Pipeline,以使NLP模型可鲁and扰。我们提出了连续的Word2Vec(CW2V),这是我们的数据驱动方法学习单词嵌入的方法,以确保单词的扰动具有与原始单词相似的嵌入。我们表明,与基于字符ngrams的嵌入相比,CW2V嵌入对于文本扰动通常更健壮。我们强大的分类管道使用建议的辩护方法和词嵌入将DEOBFUSCATION和分类结合在一起,以分类Facebook帖子是否请求诸如喜欢的参与度。我们的管道导致参与诱饵分类从0.70到0.67 AUC,而对抗文本扰动,而基于字符的单词嵌入方法导致下游分类从0.76到0.64。

Social networks have become an indispensable part of our lives, with billions of people producing ever-increasing amounts of text. At such scales, content policies and their enforcement become paramount. To automate moderation, questionable content is detected by Natural Language Processing (NLP) classifiers. However, high-performance classifiers are hampered by misspellings and adversarial text perturbations. In this paper, we classify intentional and unintentional adversarial text perturbation into ten types and propose a deobfuscation pipeline to make NLP models robust to such perturbations. We propose Continuous Word2Vec (CW2V), our data-driven method to learn word embeddings that ensures that perturbations of words have embeddings similar to those of the original words. We show that CW2V embeddings are generally more robust to text perturbations than embeddings based on character ngrams. Our robust classification pipeline combines deobfuscation and classification, using proposed defense methods and word embeddings to classify whether Facebook posts are requesting engagement such as likes. Our pipeline results in engagement bait classification that goes from 0.70 to 0.67 AUC with adversarial text perturbation, while character ngram-based word embedding methods result in downstream classification that goes from 0.76 to 0.64.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源