论文标题
通过动态自动冲突改善人类标记的数据
Improving Human-Labeled Data through Dynamic Automatic Conflict Resolution
论文作者
论文摘要
本文为(a)估计典型的众包语义注释任务产生的标签的噪声而开发并实现了可扩展的方法,并且(b)将标签过程的错误误差降低了多达20-30%,与其他常见标签策略相比。重要的是,我们将这种新的标签过程方法(我们将其命名为动态自动冲突解决方案(DACR))命名的新方法不需要地面真实数据集,而是基于项目间注释不一致的。这使DACR不仅更准确,而且可以用于广泛的标签任务。在接下来的内容中,我们介绍了针对商业私人助理大规模执行的文本分类任务的结果,并与其他常见的标签策略相比,评估了该注释策略所发现的固有歧义。
This paper develops and implements a scalable methodology for (a) estimating the noisiness of labels produced by a typical crowdsourcing semantic annotation task, and (b) reducing the resulting error of the labeling process by as much as 20-30% in comparison to other common labeling strategies. Importantly, this new approach to the labeling process, which we name Dynamic Automatic Conflict Resolution (DACR), does not require a ground truth dataset and is instead based on inter-project annotation inconsistencies. This makes DACR not only more accurate but also available to a broad range of labeling tasks. In what follows we present results from a text classification task performed at scale for a commercial personal assistant, and evaluate the inherent ambiguity uncovered by this annotation strategy as compared to other common labeling strategies.