论文标题
成为贝叶斯人,甚至只是一点,都可以在Relu Networks中过度自信
Being Bayesian, Even Just a Bit, Fixes Overconfidence in ReLU Networks
论文作者
论文摘要
RELU分类网络的点估计值 - 可以说是使用最广泛的神经网络体系结构 - 已显示出来任意高的信心远离培训数据。因此,这种结构与最大后验估计方案结合使用,因此没有校准也不强大。近似贝叶斯的推论已被经验证明,以改善神经网络中的预测不确定性,尽管这种贝叶斯近似的理论分析受到限制。我们从理论上分析了Relu网络权重的近似高斯分布,并表明它们解决了过度自信问题。此外,我们表明,即使是简单,便宜的贝叶斯近似,也可以解决这些问题。这表明在Relu网络上校准不确定性的足够条件是“有点贝叶斯”。这些理论上的结果验证了最后一层贝叶斯近似的使用,并激发了一系列富裕成本权衡。我们使用常见的深层relu网络和拉普拉斯近似值通过各种标准实验从经验上进一步验证了这些发现。
The point estimates of ReLU classification networks---arguably the most widely used neural network architecture---have been shown to yield arbitrarily high confidence far away from the training data. This architecture, in conjunction with a maximum a posteriori estimation scheme, is thus not calibrated nor robust. Approximate Bayesian inference has been empirically demonstrated to improve predictive uncertainty in neural networks, although the theoretical analysis of such Bayesian approximations is limited. We theoretically analyze approximate Gaussian distributions on the weights of ReLU networks and show that they fix the overconfidence problem. Furthermore, we show that even a simplistic, thus cheap, Bayesian approximation, also fixes these issues. This indicates that a sufficient condition for a calibrated uncertainty on a ReLU network is "to be a bit Bayesian". These theoretical results validate the usage of last-layer Bayesian approximation and motivate a range of a fidelity-cost trade-off. We further validate these findings empirically via various standard experiments using common deep ReLU networks and Laplace approximations.