校准概括差距

论文标题

校准概括差距

The Calibration Generalization Gap

论文作者

Carrell, A. Michael, Mallinar, Neil, Lucas, James, Nakkiran, Preetum

论文摘要

校准是一个良好的预测模型的基本特性：它要求该模型按其置信度正确预测。但是，现代神经网络对其校准没有强大的保证，并且根据设置的不同，可以进行校准较差或校准。目前尚不清楚哪些因素有助于良好的校准（体系结构，数据增强，过度参数化等），尽管文献中存在各种主张。我们提出了一种研究校准误差的系统方法：通过将其分解为（1）在火车组上的校准误差，以及（2）校准概括差距。这反映了概括的基本分解。然后，我们研究了这些术语中的每一个，并提供经验证据，表明（1）DNN通常在其火车组上始终进行校准，并且（2）校准概括差距受标准概括差距的上限。综上所述，这意味着具有较小的概括差距（|测试误差 - 列车误差|）的模型被良好地校准。该观点统一了文献中的许多结果，并表明减少概括差距的干预措施（例如添加数据，使用重型增强或较小的模型大小）也可以改善校准。因此，我们希望我们的最初研究为对校准，概括和优化之间关系的更加系统和全面的理解奠定了基础。

Calibration is a fundamental property of a good predictive model: it requires that the model predicts correctly in proportion to its confidence. Modern neural networks, however, provide no strong guarantees on their calibration -- and can be either poorly calibrated or well-calibrated depending on the setting. It is currently unclear which factors contribute to good calibration (architecture, data augmentation, overparameterization, etc), though various claims exist in the literature. We propose a systematic way to study the calibration error: by decomposing it into (1) calibration error on the train set, and (2) the calibration generalization gap. This mirrors the fundamental decomposition of generalization. We then investigate each of these terms, and give empirical evidence that (1) DNNs are typically always calibrated on their train set, and (2) the calibration generalization gap is upper-bounded by the standard generalization gap. Taken together, this implies that models with small generalization gap (|Test Error - Train Error|) are well-calibrated. This perspective unifies many results in the literature, and suggests that interventions which reduce the generalization gap (such as adding data, using heavy augmentation, or smaller model size) also improve calibration. We thus hope our initial study lays the groundwork for a more systematic and comprehensive understanding of the relation between calibration, generalization, and optimization.

下载PDF全文

下载文献需遵守相关版权规定

论文标题