令人尴尬的简单性能预测绑架自然语言推断

论文标题

令人尴尬的简单性能预测绑架自然语言推断

Embarrassingly Simple Performance Prediction for Abductive Natural Language Inference

论文作者

Kadiķis, Emīls, Srivastav, Vaibhav, Klinger, Roman

论文摘要

绑架性自然语言推论（αNLI）的任务是确定哪种假设是一组观察的可能性更有可能的解释，这是NLI的一种特别困难的类型。与其仅仅确定因果关系，还需要常识，还需要评估解释的合理性。所有最新的竞争系统都建立在上下文化表示之上，并利用变压器体系结构来学习NLI模型。当某人面对特定的NLI任务时，他们需要选择可用的最佳模型。这是一项耗时且资源浓厚的努力。为了解决这个实际问题，我们提出了一种简单的方法来预测性能而无需实际调整模型。我们通过测试预先训练的模型在αNLI任务上的表现如何，仅将具有余弦相似性的句子与训练分类器训练在这些嵌入的嵌入式训练分类器时所实现的句子。我们表明，余弦相似方法的准确性与Pearson相关系数为0.65的分类方法的准确性密切相关。由于相似性计算是在给定数据集上计算的数量级（少于一分钟与小时），因此我们的方法可以在模型选择过程中节省大量时间。

The task of abductive natural language inference (αnli), to decide which hypothesis is the more likely explanation for a set of observations, is a particularly difficult type of NLI. Instead of just determining a causal relationship, it requires common sense to also evaluate how reasonable an explanation is. All recent competitive systems build on top of contextualized representations and make use of transformer architectures for learning an NLI model. When somebody is faced with a particular NLI task, they need to select the best model that is available. This is a time-consuming and resource-intense endeavour. To solve this practical problem, we propose a simple method for predicting the performance without actually fine-tuning the model. We do this by testing how well the pre-trained models perform on the αnli task when just comparing sentence embeddings with cosine similarity to what the performance that is achieved when training a classifier on top of these embeddings. We show that the accuracy of the cosine similarity approach correlates strongly with the accuracy of the classification approach with a Pearson correlation coefficient of 0.65. Since the similarity computation is orders of magnitude faster to compute on a given dataset (less than a minute vs. hours), our method can lead to significant time savings in the process of model selection.

下载PDF全文

下载文献需遵守相关版权规定

论文标题