通过数据增强语义关系推断，反亚洲仇恨言论检测

论文标题

通过数据增强语义关系推断，反亚洲仇恨言论检测

Anti-Asian Hate Speech Detection via Data Augmented Semantic Relation Inference

论文作者

Li, Jiaxuan, Ning, Yue

论文摘要

近年来，随着仇恨言论在社交媒体上的传播，仇恨言论的自动发现已成为一项至关重要的任务，并引起了各个社区的关注。此任务旨在识别包含可恶信息的在线帖子（例如推文）。社交媒体中语言的特殊性，例如简短且书写不佳的内容，导致学习语义和捕获仇恨言论的歧视性特征的困难。先前的研究利用了其他有用的资源，例如情感主题标签，以提高仇恨言论检测的性能。主题标签作为输入功能添加为情感速度或额外上下文信息。但是，我们的密切调查表明，直接利用这些功能而不考虑其上下文可能会将噪声引入分类器。在本文中，我们提出了一种新颖的方法来利用情感主题标签来增强自然语言推理框架的仇恨言论检测。我们设计了一个新颖的框架SRIC，同时执行两个任务：（1）在线帖子和情感主题标签之间的语义关系推断，以及（2）这些帖子上的情感分类。语义关系推论旨在鼓励模型将情感指示信息编码为在线帖子的表示形式。我们在两个现实世界数据集上进行了广泛的实验，并与最先进的表示模型相比，证明了我们提出的框架的有效性。

With the spreading of hate speech on social media in recent years, automatic detection of hate speech is becoming a crucial task and has attracted attention from various communities. This task aims to recognize online posts (e.g., tweets) that contain hateful information. The peculiarities of languages in social media, such as short and poorly written content, lead to the difficulty of learning semantics and capturing discriminative features of hate speech. Previous studies have utilized additional useful resources, such as sentiment hashtags, to improve the performance of hate speech detection. Hashtags are added as input features serving either as sentiment-lexicons or extra context information. However, our close investigation shows that directly leveraging these features without considering their context may introduce noise to classifiers. In this paper, we propose a novel approach to leverage sentiment hashtags to enhance hate speech detection in a natural language inference framework. We design a novel framework SRIC that simultaneously performs two tasks: (1) semantic relation inference between online posts and sentiment hashtags, and (2) sentiment classification on these posts. The semantic relation inference aims to encourage the model to encode sentiment-indicative information into representations of online posts. We conduct extensive experiments on two real-world datasets and demonstrate the effectiveness of our proposed framework compared with state-of-the-art representation learning models.

下载PDF全文

下载文献需遵守相关版权规定

论文标题