论文标题
解开高斯 - 牛顿法和神经网络的近似推断
Disentangling the Gauss-Newton Method and Approximate Inference for Neural Networks
论文作者
论文摘要
在本论文中,我们解散了广义的高斯 - 纽顿,并大致推论了贝叶斯深度学习。广义高斯 - 纽顿方法是一种优化方法,用于几种流行的贝叶斯深度学习算法中。将高斯牛顿法与拉普拉斯和高斯变分近似结合的算法最近导致了最新的贝叶斯深度学习。尽管已经对拉普拉斯和高斯变分近似进行了广泛的研究,但它们与高斯 - 纽顿方法的相互作用尚不清楚。最近对贝叶斯深度学习中先验和后近似的批评进一步促使人们需要对实用算法进行更深入的了解。对神经网络的高斯方法,拉普拉斯和高斯变异近似的个人分析提供了理论洞察力和新的实用算法。我们发现高斯 - 纽顿方法可显着简化基本的概率模型。特别是,高斯 - 纽顿方法与近似推断的组合可以作为线性或高斯过程模型的推断。拉普拉斯和高斯变分近似随后可以为这些简化模型提供后近似。对最近的贝叶斯深度学习算法的这种新的解剖理解也导致了新的方法:首先,与高斯流程的连接可实现新的功能空间推理算法。其次,我们提出了基础概率模型的边际可能性近似,以调整神经网络超级参数。最后,确定的基础模型导致了计算预测分布的不同方法。实际上,我们发现这些针对贝叶斯神经网络的预测方法通常比默认选择更好,并解决了拉普拉斯近似的共同问题。
In this thesis, we disentangle the generalized Gauss-Newton and approximate inference for Bayesian deep learning. The generalized Gauss-Newton method is an optimization method that is used in several popular Bayesian deep learning algorithms. Algorithms that combine the Gauss-Newton method with the Laplace and Gaussian variational approximation have recently led to state-of-the-art results in Bayesian deep learning. While the Laplace and Gaussian variational approximation have been studied extensively, their interplay with the Gauss-Newton method remains unclear. Recent criticism of priors and posterior approximations in Bayesian deep learning further urges the need for a deeper understanding of practical algorithms. The individual analysis of the Gauss-Newton method and Laplace and Gaussian variational approximations for neural networks provides both theoretical insight and new practical algorithms. We find that the Gauss-Newton method simplifies the underlying probabilistic model significantly. In particular, the combination of the Gauss-Newton method with approximate inference can be cast as inference in a linear or Gaussian process model. The Laplace and Gaussian variational approximation can subsequently provide a posterior approximation to these simplified models. This new disentangled understanding of recent Bayesian deep learning algorithms also leads to new methods: first, the connection to Gaussian processes enables new function-space inference algorithms. Second, we present a marginal likelihood approximation of the underlying probabilistic model to tune neural network hyperparameters. Finally, the identified underlying models lead to different methods to compute predictive distributions. In fact, we find that these prediction methods for Bayesian neural networks often work better than the default choice and solve a common issue with the Laplace approximation.