利用依赖性语法使用图形卷积网络进行细粒度的进攻性语言检测

论文标题

利用依赖性语法使用图形卷积网络进行细粒度的进攻性语言检测

Leveraging Dependency Grammar for Fine-Grained Offensive Language Detection using Graph Convolutional Networks

论文作者

Goel, Divyam, Sharma, Raksha

论文摘要

最近几年见证了社交媒体上进攻文本的传播的指数呈上升。以高精度对该文本的识别对于社会的福祉至关重要。大多数现有的方法倾向于给无害陈述（例如“我是同性恋者”）给出较高的毒性得分。这些假阳性是由于培训数据过度归纳而产生的，在培训数据中，该声明中的特定术语可能以贬义的意义使用（例如，“同性恋”）。仅强调这样的单词可以导致对这些系统旨在保护的类别的歧视。在本文中，我们解决了Twitter上进攻性语言检测的问题，同时还检测了进攻的类型和目标。我们提出了一种名为Sylstm的新颖方法，该方法以句子的依赖性解析树的形式集成了句法特征，并以单词嵌入形式的语义特征使用图形卷积网络以嵌入式嵌入的形式集成到深度学习体系结构中。结果表明，所提出的方法显着优于最先进的BERT模型，其参数数量少。

The last few years have witnessed an exponential rise in the propagation of offensive text on social media. Identification of this text with high precision is crucial for the well-being of society. Most of the existing approaches tend to give high toxicity scores to innocuous statements (e.g., "I am a gay man"). These false positives result from over-generalization on the training data where specific terms in the statement may have been used in a pejorative sense (e.g., "gay"). Emphasis on such words alone can lead to discrimination against the classes these systems are designed to protect. In this paper, we address the problem of offensive language detection on Twitter, while also detecting the type and the target of the offence. We propose a novel approach called SyLSTM, which integrates syntactic features in the form of the dependency parse tree of a sentence and semantic features in the form of word embeddings into a deep learning architecture using a Graph Convolutional Network. Results show that the proposed approach significantly outperforms the state-of-the-art BERT model with orders of magnitude fewer number of parameters.

下载PDF全文

下载文献需遵守相关版权规定

论文标题