对高斯 - 富马克斯的平均近似值，并应用于不确定性估计的应用

论文标题

对高斯 - 富马克斯的平均近似值，并应用于不确定性估计的应用

Mean-Field Approximation to Gaussian-Softmax Integral with Application to Uncertainty Estimation

论文作者

Lu, Zhiyun, Ie, Eugene, Sha, Fei

论文摘要

已经提出了许多方法来量化与深神经网络输出相关的预测不确定性。其中，合奏方法通常会导致最先进的结果，尽管它们需要对培训程序进行修改，并且在培训和推理方面的计算成本高昂。在本文中，我们提出了一种新的基于单模的方法。主要思想的灵感来自于观察到的观察，即我们可以通过从高斯分布中汲取模型来“模拟”模型的合奏，其形式与渐近正态性理论，无限小折刀，拉普拉斯（Laplacian）的近似值，贝叶斯神经网络的近似和轨迹相似。但是，我们没有在“集合”中使用每个模型来预测并汇总其预测，而是整合了高斯分布和神经网络的SoftMax输出。我们使用平均场近似公式来计算这种分析性棘手的积分。所提出的方法具有多种吸引人的属性：它在不需要多个模型的情况下用作合奏，并且仅使用高斯的第一矩和第二矩才能实现封闭形式的近似推理。从经验上讲，与最先进的方法相比，提出的方法在标准不确定性估计任务上进行了竞争性能。它还优于分布检测的许多方法。

Many methods have been proposed to quantify the predictive uncertainty associated with the outputs of deep neural networks. Among them, ensemble methods often lead to state-of-the-art results, though they require modifications to the training procedures and are computationally costly for both training and inference. In this paper, we propose a new single-model based approach. The main idea is inspired by the observation that we can "simulate" an ensemble of models by drawing from a Gaussian distribution, with a form similar to those from the asymptotic normality theory, infinitesimal Jackknife, Laplacian approximation to Bayesian neural networks, and trajectories in stochastic gradient descents. However, instead of using each model in the "ensemble" to predict and then aggregating their predictions, we integrate the Gaussian distribution and the softmax outputs of the neural networks. We use a mean-field approximation formula to compute this analytically intractable integral. The proposed approach has several appealing properties: it functions as an ensemble without requiring multiple models, and it enables closed-form approximate inference using only the first and second moments of the Gaussian. Empirically, the proposed approach performs competitively when compared to state-of-the-art methods, including deep ensembles, temperature scaling, dropout and Bayesian NNs, on standard uncertainty estimation tasks. It also outperforms many methods on out-of-distribution detection.

下载PDF全文

下载文献需遵守相关版权规定

论文标题