重新检查可解释NLP的人类注释

论文标题

重新检查可解释NLP的人类注释

Re-Examining Human Annotations for Interpretable NLP

论文作者

Chiang, Cheng-Han, Lee, Hung-yi

论文摘要

可解释的NLP中的解释方法通常通过从支持该决定的输入文本中提取证据（理性）来解释模型的决定。已发布了理由基准数据集，以评估理由的良好。这些数据集中的地面真理原理通常是通过众包网站获得的人类注释。像这些数据集一样有价值，有关如何获得这些人类注释的详细信息通常没有明确指定。我们使用可解释的NLP中两个广泛使用的数据集上的拥挤网站进行全面的对照实验，以了解这些未说的细节如何影响注释结果。具体而言，我们比较从满足不同资格级别的招聘工人获得的注释结果。我们还为高质量的工人提供了不同的说明，以完成相同的基础任务。我们的结果表明，注释质量高度符合工人的资格，并且可以指导工人根据说明提供某些注释。我们进一步表明，使用特定指令获得的地面真理理由评估时，特定的解释方法的表现更好。基于这些观察结果，我们强调了提供注释过程的完整细节的重要性，并呼吁仔细解释使用这些注释获得的任何实验结果。

Explanation methods in Interpretable NLP often explain the model's decision by extracting evidence (rationale) from the input texts supporting the decision. Benchmark datasets for rationales have been released to evaluate how good the rationale is. The ground truth rationales in these datasets are often human annotations obtained via crowd-sourced websites. Valuable as these datasets are, the details on how those human annotations are obtained are often not clearly specified. We conduct comprehensive controlled experiments using crowd-sourced websites on two widely used datasets in Interpretable NLP to understand how those unsaid details can affect the annotation results. Specifically, we compare the annotation results obtained from recruiting workers satisfying different levels of qualification. We also provide high-quality workers with different instructions for completing the same underlying tasks. Our results reveal that the annotation quality is highly subject to the workers' qualification, and workers can be guided to provide certain annotations by the instructions. We further show that specific explanation methods perform better when evaluated using the ground truth rationales obtained by particular instructions. Based on these observations, we highlight the importance of providing complete details of the annotation process and call for careful interpretation of any experiment results obtained using those annotations.

下载PDF全文

下载文献需遵守相关版权规定

论文标题