机器生成和检测阿拉伯语和虚假新闻

论文标题

机器生成和检测阿拉伯语和虚假新闻

Machine Generation and Detection of Arabic Manipulated and Fake News

论文作者

Nagoudi, El Moatez Billah, Elmadany, AbdelRahim, Abdul-Mageed, Muhammad, Alhindi, Tariq, Cavusoglu, Hasan

论文摘要

虚假新闻和欺骗性的机器生成的文字是威胁现代社会的严重问题，包括阿拉伯世界。这激发了在网上检测错误和操纵的故事的工作。但是，这项研究的瓶颈缺乏足够的数据来训练检测模型。我们提出了一种新颖的方法，可以自动生成被操纵（和可能是假）新闻报道的阿拉伯语。我们的方法很简单，仅取决于在线丰富的真实故事的可用性，也是语音标记器的一部分（POS）。为了促进未来的工作，我们通过提供Aranews，这是一个可以在现成的新颖而大型的新闻数据集中，完全避免了这两种要求。使用基于Aranews产生的故事，我们进行了一项人类注释研究，阐明了机器操纵对文本真实性的影响。该研究还衡量了人类检测我们方法产生的阿拉伯机器操纵文本的能力。最后，我们开发了第一个用于检测被操纵的阿拉伯新闻并获得阿拉伯假新闻检测结果的最新结果（宏F1 = 70.06）。我们的模型和数据公开可用。

Fake news and deceptive machine-generated text are serious problems threatening modern societies, including in the Arab world. This motivates work on detecting false and manipulated stories online. However, a bottleneck for this research is lack of sufficient data to train detection models. We present a novel method for automatically generating Arabic manipulated (and potentially fake) news stories. Our method is simple and only depends on availability of true stories, which are abundant online, and a part of speech tagger (POS). To facilitate future work, we dispense with both of these requirements altogether by providing AraNews, a novel and large POS-tagged news dataset that can be used off-the-shelf. Using stories generated based on AraNews, we carry out a human annotation study that casts light on the effects of machine manipulation on text veracity. The study also measures human ability to detect Arabic machine manipulated text generated by our method. Finally, we develop the first models for detecting manipulated Arabic news and achieve state-of-the-art results on Arabic fake news detection (macro F1=70.06). Our models and data are publicly available.

下载PDF全文

下载文献需遵守相关版权规定

论文标题