论文标题
仇恨言论注释:Maneco语料库和一些批判性话语分析中的一些意见
Annotating for Hate Speech: The MaNeCo Corpus and Some Input from Critical Discourse Analysis
论文作者
论文摘要
本文介绍了一个新的计划,用于在Web 2.0评论中对仇恨言论的注释。提出的计划是由对关于地中海移民危机和LGBTIQ+事项的新闻报道的批判性分析的激励,该新闻报道是根据欧盟资助的C.O.N.T.T.A.C.T.进行的,这是马耳他的。项目。 Based on the realization that hate speech is not a clear-cut category to begin with, appears to belong to a continuum of discriminatory discourse and is often realized through the use of indirect linguistic means, it is argued that annotation schemes for its detection should refrain from directly including the label 'hate speech,' as different annotators might have different thresholds as to what constitutes hate speech and what not.鉴于此,我们建议使用多层注释方案,该方案对二进制+/-仇恨言语分类进行了试点测试,并似乎产生了更高的通知者一致性。然后,我们激发了我们计划的假设,然后介绍了最终将使用它的Maneco语料库;大量的在线报纸评论跨越10年。
This paper presents a novel scheme for the annotation of hate speech in corpora of Web 2.0 commentary. The proposed scheme is motivated by the critical analysis of posts made in reaction to news reports on the Mediterranean migration crisis and LGBTIQ+ matters in Malta, which was conducted under the auspices of the EU-funded C.O.N.T.A.C.T. project. Based on the realization that hate speech is not a clear-cut category to begin with, appears to belong to a continuum of discriminatory discourse and is often realized through the use of indirect linguistic means, it is argued that annotation schemes for its detection should refrain from directly including the label 'hate speech,' as different annotators might have different thresholds as to what constitutes hate speech and what not. In view of this, we suggest a multi-layer annotation scheme, which is pilot-tested against a binary +/- hate speech classification and appears to yield higher inter-annotator agreement. Motivating the postulation of our scheme, we then present the MaNeCo corpus on which it will eventually be used; a substantial corpus of on-line newspaper comments spanning 10 years.