论文标题

分类建设性评论

Classifying Constructive Comments

论文作者

Kolhatkar, Varada, Thain, Nithum, Sorensen, Jeffrey, Dixon, Lucas, Taboada, Maite

论文摘要

我们介绍了由12,000个带注释的新闻评论组成的建设性评论语料库(C3),旨在帮助为在线社区建立新的工具,以提高其讨论的质量。我们将建设性评论定义为高质量评论,为对话做出贡献。我们解释了人群工人注释计划,并定义了建设性的亚概述的分类法。使用通道间协议的测量,样本的专家评估以及通过建设性的子特征来评估注释方案和所得数据集的质量,我们证明,这为一般建设性概念提供了代理。我们提供了使用基于功能的和各种深度学习方法在C3上训练的建设性的模型,并证明这些模型通过域适应实验捕获了建设性的一般或特定于域特定的建设性特征。我们检查了长度在模型中扮演的作用,因为如果模型在很大程度上取决于此功能,则可以轻松地进行注释。通过检查每个模型及其分布的错误,我们表明,最佳性能模型与评论长度的相关性较小。建设性语料库和我们的实验为适度工具铺平了道路,该工具专注于促进评论,而不是仅仅过滤不良内容。

We introduce the Constructive Comments Corpus (C3), comprised of 12,000 annotated news comments, intended to help build new tools for online communities to improve the quality of their discussions. We define constructive comments as high-quality comments that make a contribution to the conversation. We explain the crowd worker annotation scheme and define a taxonomy of sub-characteristics of constructiveness. The quality of the annotation scheme and the resulting dataset is evaluated using measurements of inter-annotator agreement, expert assessment of a sample, and by the constructiveness sub-characteristics, which we show provide a proxy for the general constructiveness concept. We provide models for constructiveness trained on C3 using both feature-based and a variety of deep learning approaches and demonstrate that these models capture general rather than topic- or domain-specific characteristics of constructiveness, through domain adaptation experiments. We examine the role that length plays in our models, as comment length could be easily gamed if models depend heavily upon this feature. By examining the errors made by each model and their distribution by length, we show that the best performing models are less correlated with comment length.The constructiveness corpus and our experiments pave the way for a moderation tool focused on promoting comments that make a contribution, rather than only filtering out undesirable content.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源