论文标题
可扩展且灵活的COX比例危害模型,用于高维生存预测和功能选择
A scalable and flexible Cox proportional hazards model for high-dimensional survival prediction and functional selection
论文作者
论文摘要
COX比例危害模型是生物医学数据分析中最受欢迎的模型之一。一直在不断努力提高此类模型以进行复杂信号检测的灵活性,例如通过加性功能。然而,扩展Cox添加模型以适应高维数据的任务是不平凡的。当估计添加功能时,常用组稀疏正则化可能会在加性功能上引入过多的平滑收缩,从而破坏预测性能。此外,如果存在非线性效应,则“全力以赴”的方法使功能选择挑战。我们开发了一个加法COX pH模型,以在高维数据分析中解决这些挑战。值得注意的是,我们强加了一个新颖的尖峰和斜肌拉索,这激发了对加性功能的双层功能选择。可扩展的确定性算法EM坐标下降是为可扩展模型拟合而设计的。在模拟研究和代谢组学数据分析中,我们将预测性和计算性能与最新模型进行了比较。所提出的模型广泛适用于各个研究领域,例如基因组学和人口健康,通过免费可用的R包BHAM(https://boyiguo1.github.io/bham/)。
Cox proportional hazards model is one of the most popular models in biomedical data analysis. There have been continuing efforts to improve the flexibility of such models for complex signal detection, for example, via additive functions. Nevertheless, the task to extend Cox additive models to accommodate high-dimensional data is nontrivial. When estimating additive functions, commonly used group sparse regularization may introduce excess smoothing shrinkage on additive functions, damaging predictive performance. Moreover, an "all-in-all-out" approach makes functional selection challenging to answer if nonlinear effects exist. We develop an additive Cox PH model to address these challenges in high-dimensional data analysis. Notably, we impose a novel spike-and-slab LASSO prior that motivates the bi-level functional selection on additive functions. A scalable and deterministic algorithm, EM-Coordinate Descent, is designed for scalable model fitting. We compare the predictive and computational performance against state-of-the-art models in simulation studies and metabolomics data analysis. The proposed model is broadly applicable to various fields of research, e.g. genomics and population health, via the freely available R package BHAM (https://boyiguo1.github.io/BHAM/).