机器阅读系统的校准大规模

论文标题

机器阅读系统的校准大规模

Calibration of Machine Reading Systems at Scale

论文作者

Dhuliawala, Shehzaad, Adolphs, Leonard, Das, Rajarshi, Sachan, Mrinmaya

论文摘要

在典型的机器学习系统中，对预测概率的估计用于评估系统对预测的信心。这种置信度度量通常是未校准的。即，系统对预测的信心与预测输出的真实概率不符。在本文中，我们提出了一项调查，以校准开放式机器阅读系统，例如开放域问答和索赔验证系统。我们表明，校准包含离散检索和深层阅读组件的复杂系统具有挑战性，并且当前的校准技术无法扩展到这些设置。我们建议对现有的校准方法进行简单的扩展，以使我们能够使它们适应这些设置。我们的实验结果表明，该方法效果很好，并且在提出问题答案系统带有无法解开或训练的分发问题时有选择性地预测答案很有用。

In typical machine learning systems, an estimate of the probability of the prediction is used to assess the system's confidence in the prediction. This confidence measure is usually uncalibrated; i.e.\ the system's confidence in the prediction does not match the true probability of the predicted output. In this paper, we present an investigation into calibrating open setting machine reading systems such as open-domain question answering and claim verification systems. We show that calibrating such complex systems which contain discrete retrieval and deep reading components is challenging and current calibration techniques fail to scale to these settings. We propose simple extensions to existing calibration approaches that allows us to adapt them to these settings. Our experimental results reveal that the approach works well, and can be useful to selectively predict answers when question answering systems are posed with unanswerable or out-of-the-training distribution questions.

下载PDF全文

下载文献需遵守相关版权规定

论文标题