评估情境信息在仇恨语音检测中的影响

论文标题

评估情境信息在仇恨语音检测中的影响

Assessing the impact of contextual information in hate speech detection

论文作者

Pérez, Juan Manuel, Luque, Franco, Zayat, Demian, Kondratzky, Martín, Moro, Agustín, Serrati, Pablo, Zajac, Joaquín, Miguel, Paula, Debandi, Natalia, Gravano, Agustín, Cotik, Viviana

论文摘要

近年来，仇恨言论在社交网络和其他虚拟媒体中获得了很大的意义，因为它的强度及其与对受保护群体成员的暴力行为的关系。由于用户产生的内容大量，因此在研究和开发自动工具方面做出了巨大的努力，以帮助对本演讲的分析和节制，至少以最威胁性的形式进行了努力。当前自动仇恨语音检测方法的局限性之一是缺乏背景。大多数研究和资源都是在没有上下文的数据上进行的；也就是说，没有任何类型的对话上下文或正在讨论的主题的孤立消息。这限制了可用的信息来定义社交网络上的帖子是否令人讨厌。在这项工作中，我们根据Twitter上媒体对新闻帖子的用户响应，为上下文化的仇恨语音检测提供了新颖的语料库。该语料库是在西班牙语的rioplatense言语种类中收集的，并着重于与Covid-19的大流行有关的仇恨言论。使用最先进技术的分类实验表明，添加上下文信息可以改善两个提议的任务（二进制和多标签预测）的仇恨言语检测绩效的证据。我们使代码，模型和语料库可用于进一步研究。

In recent years, hate speech has gained great relevance in social networks and other virtual media because of its intensity and its relationship with violent acts against members of protected groups. Due to the great amount of content generated by users, great effort has been made in the research and development of automatic tools to aid the analysis and moderation of this speech, at least in its most threatening forms. One of the limitations of current approaches to automatic hate speech detection is the lack of context. Most studies and resources are performed on data without context; that is, isolated messages without any type of conversational context or the topic being discussed. This restricts the available information to define if a post on a social network is hateful or not. In this work, we provide a novel corpus for contextualized hate speech detection based on user responses to news posts from media outlets on Twitter. This corpus was collected in the Rioplatense dialectal variety of Spanish and focuses on hate speech associated with the COVID-19 pandemic. Classification experiments using state-of-the-art techniques show evidence that adding contextual information improves hate speech detection performance for two proposed tasks (binary and multi-label prediction). We make our code, models, and corpus available for further research.

下载PDF全文

下载文献需遵守相关版权规定

论文标题