论文标题
关于概率神经网络的异质不确定性估计的陷阱
On the Pitfalls of Heteroscedastic Uncertainty Estimation with Probabilistic Neural Networks
论文作者
论文摘要
捕获不确定性是许多机器学习系统的关键部分。在深度学习中,一种常见的方法是训练神经网络,以通过在观察到的数据下最大化似然函数的对数来估计异质的高斯分布的参数。在这项工作中,我们检查了这种方法,并确定与基于梯度的优化者结合使用对数可能的危害。首先,我们提出一个综合示例,说明这种方法如何导致非常差但稳定的参数估计值。其次,我们将罪魁祸首确定为对数类样损失,以及某些加剧问题的条件。第三,我们提出了一种替代公式,称为$β$ -NLL,其中每个数据点对损失的贡献都由$β$ atportional的方差估算加权。我们表明,在我们的说明性示例中,使用适当的$β$在很大程度上减轻了问题。第四,我们在一系列领域和任务上评估了这种方法,并表明它在预测性RMSE和Log-ofikelihoodyhienhiad标准中都可以实现相当大的改进,并且对超参数的性能更加强大。
Capturing aleatoric uncertainty is a critical part of many machine learning systems. In deep learning, a common approach to this end is to train a neural network to estimate the parameters of a heteroscedastic Gaussian distribution by maximizing the logarithm of the likelihood function under the observed data. In this work, we examine this approach and identify potential hazards associated with the use of log-likelihood in conjunction with gradient-based optimizers. First, we present a synthetic example illustrating how this approach can lead to very poor but stable parameter estimates. Second, we identify the culprit to be the log-likelihood loss, along with certain conditions that exacerbate the issue. Third, we present an alternative formulation, termed $β$-NLL, in which each data point's contribution to the loss is weighted by the $β$-exponentiated variance estimate. We show that using an appropriate $β$ largely mitigates the issue in our illustrative example. Fourth, we evaluate this approach on a range of domains and tasks and show that it achieves considerable improvements and performs more robustly concerning hyperparameters, both in predictive RMSE and log-likelihood criteria.