注释错误检测：分析过去和现在以更连贯的未来

论文标题

注释错误检测：分析过去和现在以更连贯的未来

Annotation Error Detection: Analyzing the Past and Present for a More Coherent Future

论文作者

Klie, Jan-Christoph, Webber, Bonnie, Gurevych, Iryna

论文摘要

注释数据是用于培训和评估机器学习模型的自然语言处理中的重要成分。因此，注释具有高质量是非常理想的。但是，最近的工作表明，几个流行的数据集包含令人惊讶的注释错误或不一致之处。为了减轻此问题，多年来已经设计了许多注释错误检测方法。尽管研究人员表明他们的方法在新介绍的数据集上效果很好，但他们很少将其方法与以前的工作或同一数据集进行比较。这引起了人们对方法的一般表现的强烈关注，并且使他们的优势和缺点很难。因此，我们重新实现18种检测潜在注释错误的方法，并在9个英语数据集上对其进行评估，以进行文本分类以及令牌和跨度标签。此外，我们定义了统一的评估设置，包括注释错误检测任务，评估协议和一般最佳实践的新形式化。为了促进未来的研究和可重复性，我们将数据集和实施释放到易于使用和开源软件包中。

Annotated data is an essential ingredient in natural language processing for training and evaluating machine learning models. It is therefore very desirable for the annotations to be of high quality. Recent work, however, has shown that several popular datasets contain a surprising amount of annotation errors or inconsistencies. To alleviate this issue, many methods for annotation error detection have been devised over the years. While researchers show that their approaches work well on their newly introduced datasets, they rarely compare their methods to previous work or on the same datasets. This raises strong concerns on methods' general performance and makes it difficult to asses their strengths and weaknesses. We therefore reimplement 18 methods for detecting potential annotation errors and evaluate them on 9 English datasets for text classification as well as token and span labeling. In addition, we define a uniform evaluation setup including a new formalization of the annotation error detection task, evaluation protocol and general best practices. To facilitate future research and reproducibility, we release our datasets and implementations in an easy-to-use and open source software package.

下载PDF全文

下载文献需遵守相关版权规定

论文标题