论文标题
自我理性化是否可以改善对虚假相关性的鲁棒性?
Does Self-Rationalization Improve Robustness to Spurious Correlations?
论文作者
论文摘要
合理化是人类推理和学习的基础。已经研究了经过培训的基本原理的NLP模型,即称为自治模型的预测,已被调查,以解释性和对最终用户的效用。但是,通过人工编写的理由培训的程度促进了学习的促进问题。我们询问自我合理化的培训模型是否可以帮助他们学习以解决任务的正确原因。具体而言,我们评估了如何使用自由文本理性的训练自治模型如何影响六种不同尺寸的微调编码器和仅解码器模型的稳健性与虚假相关性。我们通过测量1)手动注释的挑战数据集和2)原始测试集的子集来评估对虚假相关性的鲁棒性,其中依赖对虚假相关性的依赖将无法产生正确的答案。我们发现,尽管自我理性化可以改善低资源环境中虚假相关性的稳健性,但在高资源环境中,它往往会损害鲁棒性。此外,这些影响取决于模型家庭和规模以及基本原理内容。总之,我们的结果表明,解释性可以以鲁棒性为代价。因此,在训练自我合理的模型以创建更值得信赖的模型的目标时,应采取适当的注意。
Rationalization is fundamental to human reasoning and learning. NLP models trained to produce rationales along with predictions, called self-rationalization models, have been investigated for their interpretability and utility to end-users. However, the extent to which training with human-written rationales facilitates learning remains an under-explored question. We ask whether training models to self-rationalize can aid in their learning to solve tasks for the right reasons. Specifically, we evaluate how training self-rationalization models with free-text rationales affects robustness to spurious correlations in fine-tuned encoder-decoder and decoder-only models of six different sizes. We evaluate robustness to spurious correlations by measuring performance on 1) manually annotated challenge datasets and 2) subsets of original test sets where reliance on spurious correlations would fail to produce correct answers. We find that while self-rationalization can improve robustness to spurious correlations in low-resource settings, it tends to hurt robustness in higher-resource settings. Furthermore, these effects depend on model family and size, as well as on rationale content. Together, our results suggest that explainability can come at the cost of robustness; thus, appropriate care should be taken when training self-rationalizing models with the goal of creating more trustworthy models.