改进了基于深度神经网络的Twitter的两阶段仇恨言语分类

论文标题

改进了基于深度神经网络的Twitter的两阶段仇恨言语分类

Improved two-stage hate speech classification for twitter based on Deep Neural Networks

论文作者

Pitsilis, Georgios K.

论文摘要

仇恨言论是一种在线骚扰的一种形式，涉及使用滥用语言，并且在社交媒体帖子中通常可以看到。这种骚扰主要集中在诸如宗教，性别，种族等的特定群体特征上，如今它既有社会和经济后果。文本文章中对滥用语言的自动检测一直是一项艰巨的任务，但是最近它从科学界获得了很多兴趣。本文解决了在社交媒体中辨别仇恨内容的重要问题。我们在这项工作中提出的模型是基于LSTM神经网络体系结构的现有方法的扩展，我们在短文中适当地增强和微调以检测某些形式的仇恨语言，例如种族主义或性别歧视。最重要的增强是转换为由复发神经网络（RNN）分类器组成的两阶段方案。将第一阶段的所有一Vs式分类器（OVR）分类器的输出组合在一起，并用于训练第二阶段分类器，这最终决定了骚扰的类型。我们的研究包括对在16K推文的公共语料库中评估的第二阶段提出的几种替代方法的性能比较，然后对另一个数据集进行了概括研究。报告的结果表明，与当前的最新面前相比，在仇恨言论检测任务中，所提出的方案的分类质量出色。

Hate speech is a form of online harassment that involves the use of abusive language, and it is commonly seen in social media posts. This sort of harassment mainly focuses on specific group characteristics such as religion, gender, ethnicity, etc and it has both societal and economic consequences nowadays. The automatic detection of abusive language in text postings has always been a difficult task, but it is lately receiving much interest from the scientific community. This paper addresses the important problem of discerning hateful content in social media. The model we propose in this work is an extension of an existing approach based on LSTM neural network architectures, which we appropriately enhanced and fine-tuned to detect certain forms of hatred language, such as racism or sexism, in a short text. The most significant enhancement is the conversion to a two-stage scheme consisting of Recurrent Neural Network (RNN) classifiers. The output of all One-vs-Rest (OvR) classifiers from the first stage are combined and used to train the second stage classifier, which finally determines the type of harassment. Our study includes a performance comparison of several proposed alternative methods for the second stage evaluated on a public corpus of 16k tweets, followed by a generalization study on another dataset. The reported results show the superior classification quality of the proposed scheme in the task of hate speech detection as compared to the current state-of-the-art.

下载PDF全文

下载文献需遵守相关版权规定

论文标题