论文标题
一致且可区分的LP规范校准误差估计器
A Consistent and Differentiable Lp Canonical Calibration Error Estimator
论文作者
论文摘要
校准的概率分类器是模型,其预测概率可以直接解释为不确定性估计。最近已经显示,深度神经网络的校准很差,并且往往会产生过度自信的预测。作为一种补救措施,我们根据Dirichlet内核密度估计值提出了一个低偏差,可训练的校准误差估计器,该估计值渐近地收敛到TRUE $ L_P $校准误差。这个新颖的估计器使我们能够应对多类校准的最强概念,即规范(或分布)校准,而其他常见的校准方法仅适用于顶级标签和边缘校准。我们估计器的计算复杂性为$ \ MATHCAL {O}(n^2)$,收敛速率为$ \ MATHCAL {O}(n^{ - 1/2})$,并且它最多可以通过$ \ Mathcal {o}(n^{ - 2})$,通过a ve a ve a a ve a ve a ve a ve a ve a ve a ven a ve a ve a ve a ven a vemiass debiass dememit dempiasric consecon。实际上,这意味着可以将估计器应用于小型数据子集,从而实现有效的估计和迷你批次更新。所提出的方法具有自然的内核选择,可用于基于条件期望(例如概率分类器的清晰度)来生成其他数量的一致估计。经验结果验证了我们的估计器的正确性,并证明了其在规范校准误差估计和校准误差中的实用性。
Calibrated probabilistic classifiers are models whose predicted probabilities can directly be interpreted as uncertainty estimates. It has been shown recently that deep neural networks are poorly calibrated and tend to output overconfident predictions. As a remedy, we propose a low-bias, trainable calibration error estimator based on Dirichlet kernel density estimates, which asymptotically converges to the true $L_p$ calibration error. This novel estimator enables us to tackle the strongest notion of multiclass calibration, called canonical (or distribution) calibration, while other common calibration methods are tractable only for top-label and marginal calibration. The computational complexity of our estimator is $\mathcal{O}(n^2)$, the convergence rate is $\mathcal{O}(n^{-1/2})$, and it is unbiased up to $\mathcal{O}(n^{-2})$, achieved by a geometric series debiasing scheme. In practice, this means that the estimator can be applied to small subsets of data, enabling efficient estimation and mini-batch updates. The proposed method has a natural choice of kernel, and can be used to generate consistent estimates of other quantities based on conditional expectation, such as the sharpness of a probabilistic classifier. Empirical results validate the correctness of our estimator, and demonstrate its utility in canonical calibration error estimation and calibration error regularized risk minimization.