论文标题
用于多人口寿命建模的多输出高斯流程
Multi-Output Gaussian Processes for Multi-Population Longevity Modeling
论文作者
论文摘要
我们使用高斯过程回归的空间统计框架研究了寿命趋势的联合建模。我们的分析是由人类死亡率数据库(HMD)的动机,该数据库(HMD)为近40个国家 /地区提供了统一的原始死亡表。然而,很少有随机模型一次用于处理两个以上的人群。为了弥合这一差距,我们利用机器学习的空间协方差框架,将种群视为因子协变量的不同水平,明确捕获了交叉群体依赖性。提出的多输出高斯工艺模型直接扩展到十几个人群,并本质上产生连贯的关节寿命场景。在我们的众多案例研究中,我们调查了各国和性别的死亡率经验的预测性收益,包括借用最近可用的“外国”数据。我们表明,在我们的方法中,信息融合会导致更精确(统计上更可信)的预测。我们在\ texttt {r}中实现了我们的模型,以及\ texttt {stan}中的贝叶斯版本,该版本提供了有关估计死亡率协方差结构的进一步不确定性量化。所有示例都使用公共HMD数据集。
We investigate joint modeling of longevity trends using the spatial statistical framework of Gaussian Process regression. Our analysis is motivated by the Human Mortality Database (HMD) that provides unified raw mortality tables for nearly 40 countries. Yet few stochastic models exist for handling more than two populations at a time. To bridge this gap, we leverage a spatial covariance framework from machine learning that treats populations as distinct levels of a factor covariate, explicitly capturing the cross-population dependence. The proposed multi-output Gaussian Process models straightforwardly scale up to a dozen populations and moreover intrinsically generate coherent joint longevity scenarios. In our numerous case studies we investigate predictive gains from aggregating mortality experience across nations and genders, including by borrowing the most recently available "foreign" data. We show that in our approach, information fusion leads to more precise (and statistically more credible) forecasts. We implement our models in \texttt{R}, as well as a Bayesian version in \texttt{Stan} that provides further uncertainty quantification regarding the estimated mortality covariance structure. All examples utilize public HMD datasets.