校准解释：语义解析的置信度估计

论文标题

校准解释：语义解析的置信度估计

Calibrated Interpretation: Confidence Estimation in Semantic Parsing

论文作者

Stengel-Eskin, Elias, Van Durme, Benjamin

论文摘要

序列生成模型越来越多地用于将自然语言转化为程序，即执行可执行的语义解析。语义解析旨在预测可以导致现实世界中执行行动的程序这一事实激励开发安全系统。反过来，这使得测量校准 - 安全性的核心组成部分 - 尤其重要。我们研究了在四个流行的语义解析数据集中对大众生成模型的校准，发现它在模型和数据集各不相同。然后，我们分析与校准误差相关的因素，并释放两个解析数据集的新的基于置信度的挑战分裂。为了促进在语义解析评估中包含校准，我们发布了用于计算校准指标的库。

Sequence generation models are increasingly being used to translate natural language into programs, i.e. to perform executable semantic parsing. The fact that semantic parsing aims to predict programs that can lead to executed actions in the real world motivates developing safe systems. This in turn makes measuring calibration -- a central component to safety -- particularly important. We investigate the calibration of popular generation models across four popular semantic parsing datasets, finding that it varies across models and datasets. We then analyze factors associated with calibration error and release new confidence-based challenge splits of two parsing datasets. To facilitate the inclusion of calibration in semantic parsing evaluations, we release a library for computing calibration metrics.

下载PDF全文

下载文献需遵守相关版权规定

论文标题