论文标题
无妥协的贝叶斯神经网络
Compromise-free Bayesian neural networks
论文作者
论文摘要
我们对贝叶斯神经网络(BNNS)的样本外表现与贝叶斯证据(边际可能性)之间的关系进行了详尽的分析,并使用波士顿住房数据集(Boston Housing Dataset)来查看BNNS合奏的性能。使用嵌套采样中的最先进的方法,我们在数值上采样了全部(非高斯和多模式)网络后部,并获得贝叶斯证据的数值估计,考虑到具有多达156个可训练参数的网络模型。这些网络具有零和四个隐藏层之间的介于$ \ tanh $或$ relu $ Activation函数,并且有和没有分层先验。 BNN的集合是通过确定网络上的后验分布来获得的,从相关贝叶斯证据值重新加权的单个BNN的后部样本中获得。样本外的性能与证据之间存在良好的相关性,以及证据与模型大小与样本外性能与模型大小平面之间的显着对称性。具有$ relu $激活功能的网络比具有$ \ tanh $函数的网络始终更高的证据,这反映在样本外部性能中。结构上的结构可以进一步提高相对于单个BNN的性能。
We conduct a thorough analysis of the relationship between the out-of-sample performance and the Bayesian evidence (marginal likelihood) of Bayesian neural networks (BNNs), as well as looking at the performance of ensembles of BNNs, both using the Boston housing dataset. Using the state-of-the-art in nested sampling, we numerically sample the full (non-Gaussian and multimodal) network posterior and obtain numerical estimates of the Bayesian evidence, considering network models with up to 156 trainable parameters. The networks have between zero and four hidden layers, either $\tanh$ or $ReLU$ activation functions, and with and without hierarchical priors. The ensembles of BNNs are obtained by determining the posterior distribution over networks, from the posterior samples of individual BNNs re-weighted by the associated Bayesian evidence values. There is good correlation between out-of-sample performance and evidence, as well as a remarkable symmetry between the evidence versus model size and out-of-sample performance versus model size planes. Networks with $ReLU$ activation functions have consistently higher evidences than those with $\tanh$ functions, and this is reflected in their out-of-sample performance. Ensembling over architectures acts to further improve performance relative to the individual BNNs.