神经NLP的细粒度可解释性评估基准

论文标题

神经NLP的细粒度可解释性评估基准

A Fine-grained Interpretability Evaluation Benchmark for Neural NLP

论文作者

Wang, Lijie, Shen, Yaozong, Peng, Shuyuan, Zhang, Shuai, Xiao, Xinyan, Liu, Hao, Tang, Hongxuan, Chen, Ying, Wu, Hua, Wang, Haifeng

论文摘要

尽管对神经模型的可解释性越来越关注，但由于缺乏适当的评估数据集和指标，对可解释性的评估仍然是一个开放的问题。在本文中，我们提出了一种新颖的基准，用于评估神经模型和显着性方法的解释性。该基准涵盖了三个代表性的NLP任务：情感分析，文本相似性和阅读理解，每个任务均提供英语和中文注释的数据。为了精确评估可解释性，我们提供了令牌级别的理由，这些理由是仔细注释的，以足够，紧凑和全面。我们还设计了一个新的指标，即扰动前后的理由之间的一致性，以统一评估不同类型的任务的可解释性。基于此基准，我们对具有三种显着性方法的三个典型模型进行实验，并在可解释性方面提出了其优势和缺点。我们将发布此基准https://www.luge.ai/#/luge/task/taskdetail?taskId=15，并希望它可以促进构建可信赖系统的研究。

While there is increasing concern about the interpretability of neural models, the evaluation of interpretability remains an open problem, due to the lack of proper evaluation datasets and metrics. In this paper, we present a novel benchmark to evaluate the interpretability of both neural models and saliency methods. This benchmark covers three representative NLP tasks: sentiment analysis, textual similarity and reading comprehension, each provided with both English and Chinese annotated data. In order to precisely evaluate the interpretability, we provide token-level rationales that are carefully annotated to be sufficient, compact and comprehensive. We also design a new metric, i.e., the consistency between the rationales before and after perturbations, to uniformly evaluate the interpretability on different types of tasks. Based on this benchmark, we conduct experiments on three typical models with three saliency methods, and unveil their strengths and weakness in terms of interpretability. We will release this benchmark https://www.luge.ai/#/luge/task/taskDetail?taskId=15 and hope it can facilitate the research in building trustworthy systems.

下载PDF全文

下载文献需遵守相关版权规定

论文标题