论文标题
使用PPLASSO的高维数据中的预后和预测生物标志物鉴定
Identification of prognostic and predictive biomarkers in high-dimensional data with PPLasso
论文作者
论文摘要
在临床试验中,预后和预测生物标志物的鉴定对于精确医学至关重要。预后生物标志物可用于预防疾病的发生,并且可以使用预测性生物标志物来识别具有潜在受益的患者。先前的研究主要集中在临床特征上,几乎没有研究在此类领域的基因组数据。需要一种新方法来在高维基因组数据中同时选择预后和预测性生物标志物,其中生物标志物高度相关。我们提出了一种新的方法,称为pplasso(预后预测性拉索),将预后和预测效应整合到一个统计模型中。 Pplasso还考虑了可以改变生物标志物选择精度的生物标志物之间的相关性。我们的方法包括在应用广义套索之前转换设计矩阵以删除生物标志物之间的相关性。在一项全面的数值评估中,我们表明,在各种情况下,Pplasso在预后和预测性生物标志鉴定上都优于传统的套索方法。最后,我们的方法应用于临床试验RV144的公开可用的转录组数据。我们的方法在可从综合R档案网络(CRAN)的PPLASSO R软件包中实现。
In clinical trials, identification of prognostic and predictive biomarkers is essential to precision medicine. Prognostic biomarkers can be useful for the prevention of the occurrence of the disease, and predictive biomarkers can be used to identify patients with potential benefit from the treatment. Previous researches were mainly focused on clinical characteristics, and the use of genomic data in such an area is hardly studied. A new method is required to simultaneously select prognostic and predictive biomarkers in high dimensional genomic data where biomarkers are highly correlated. We propose a novel approach called PPLasso (Prognostic Predictive Lasso) integrating prognostic and predictive effects into one statistical model. PPLasso also takes into account the correlations between biomarkers that can alter the biomarker selection accuracy. Our method consists in transforming the design matrix to remove the correlations between the biomarkers before applying the generalized Lasso. In a comprehensive numerical evaluation, we show that PPLasso outperforms the traditional Lasso approach on both prognostic and predictive biomarker identification in various scenarios. Finally, our method is applied to publicly available transcriptomic data from clinical trial RV144. Our method is implemented in the PPLasso R package available from the Comprehensive R Archive Network (CRAN).