Twitter上的阿拉伯进攻语言：分析和实验

论文标题

Twitter上的阿拉伯进攻语言：分析和实验

Arabic Offensive Language on Twitter: Analysis and Experiments

论文作者

Mubarak, Hamdy, Rashed, Ammar, Darwish, Kareem, Samih, Younes, Abdelali, Ahmed

论文摘要

在Twitter上检测进攻性语言的应用有许多应用程序，从检测/预测欺凌到测量极化。在本文中，我们专注于构建一个大型阿拉伯进攻推文数据集。我们介绍了一种构建不受主题，方言或目标偏差的数据集的方法。迄今为止，我们生产了最大的阿拉伯数据集，其中包含特殊标签的粗俗和仇恨言论。我们彻底分析数据集，以确定哪些主题，方言和性别与进攻性推文最相关，以及阿拉伯语扬声器如何使用进攻性语言。最后，我们使用SOTA技术进行了许多实验，以在数据集上产生强大的结果（F1 = 83.2）。

Detecting offensive language on Twitter has many applications ranging from detecting/predicting bullying to measuring polarization. In this paper, we focus on building a large Arabic offensive tweet dataset. We introduce a method for building a dataset that is not biased by topic, dialect, or target. We produce the largest Arabic dataset to date with special tags for vulgarity and hate speech. We thoroughly analyze the dataset to determine which topics, dialects, and gender are most associated with offensive tweets and how Arabic speakers use offensive language. Lastly, we conduct many experiments to produce strong results (F1 = 83.2) on the dataset using SOTA techniques.

下载PDF全文

下载文献需遵守相关版权规定

论文标题