论文标题
基于贝叶斯剩余的交叉验证模型比较的不确定性
Uncertainty in Bayesian Leave-One-Out Cross-Validation Based Model Comparison
论文作者
论文摘要
估计计划用于预测的模型的预期预测性能。我们专注于一对一的交叉验证(LOO-CV),这已成为估计贝叶斯模型预测性能的流行方法。在给定两个模型的情况下,我们有兴趣比较预测性能和相关的不确定性,这也可以用于计算一个模型比另一个模型更好的预测性能的概率。我们研究了贝叶斯Loo-CV估计量的特性以及预测性能差异的相关不确定性定量,并分析何时对这种不确定性的正常近似进行了很好的校准,并且是否考虑到更高的矩可以改善近似值。我们在线性回归案例中提供了理论上属性的新结果,并在经验上为层次线性,潜在线性和样条模型提供了经验,并讨论了挑战。我们表明有问题的情况包括:比较具有相似预测的模型,伪造的模型和小数据。在这些情况下,LOO-CV估计器的分布与其误差之间存在薄弱的联系。我们表明,当模型做出相似的预测时发生的差异分布的问题偏差,当数据大小在某些情况下增长到无穷大时不会消失。根据结果,我们还为贝叶斯Loo-CV的用户提供了一些实用建议,以比较模型的预测性能。
It is useful to estimate the expected predictive performance of models planned to be used for prediction. We focus on leave-one-out cross-validation (LOO-CV), which has become a popular method for estimating predictive performance of Bayesian models. Given two models, we are interested in comparing the predictive performances and associated uncertainty, which can also be used to compute the probability of one model having better predictive performance than the other model. We study the properties of the Bayesian LOO-CV estimator and the related uncertainty quantification for the predictive performance difference, and analyse when a normal approximation of this uncertainty is well calibrated and whether taking into account higher moments could improve the approximation. We provide new results of the properties both theoretically in the linear regression case and empirically for hierarchical linear, latent linear, and spline models and discuss the challenges. We show that problematic cases include: comparing models with similar predictions, misspecified models, and small data. In these cases, there is a weak connection between the distributions of the LOO-CV estimator and its error. We show that that the problematic skewness of the error distribution for the difference, which occurs when the models make similar predictions, does not fade away when the data size grows to infinity in certain situations. Based on the results, we also provide some practical recommendations for the users of Bayesian LOO-CV for comparing predictive performance of models.