BigSurvsGD：通过随机梯度下降的大生存数据分析

论文标题

BigSurvsGD：通过随机梯度下降的大生存数据分析

BigSurvSGD: Big Survival Data Analysis via Stochastic Gradient Descent

论文作者

Tarkhan, Aliasghar, Simon, Noah

论文摘要

在许多生物医学应用中，结果被衡量为``事件时间''（例如疾病进展或死亡）。为了评估患者的特征与这种结果之间的联系，通常可以假设一个比例危害模型，并符合比例危害回归（或COX回归）。为了拟合此模型，最大化了一个被称为``部分可能性''的对数符合目标函数。对于中等尺寸的数据集，可以采用一种有效的牛顿 - 拉夫森算法来利用目标的结构。但是，在大数据集中，这种方法有两个问题：1）利用结构也可以导致计算不稳定的计算技巧； 2）目标并不自然地脱杯：因此，如果数据集不适合内存，则该模型在计算上可能非常昂贵。此外，这意味着该目标不能直接适合基于随机梯度的优化方法。为了克服这些问题，我们提出了比例危害回归的简单，新的框架：这导致目标函数可容纳随机梯度下降。我们表明，这种简单的修改使我们能够使用非常大的数据集有效地拟合生存模型。这也有助于培训综合体，例如。基于神经网络，具有生存数据的模型。

In many biomedical applications, outcome is measured as a ``time-to-event'' (eg. disease progression or death). To assess the connection between features of a patient and this outcome, it is common to assume a proportional hazards model, and fit a proportional hazards regression (or Cox regression). To fit this model, a log-concave objective function known as the ``partial likelihood'' is maximized. For moderate-sized datasets, an efficient Newton-Raphson algorithm that leverages the structure of the objective can be employed. However, in large datasets this approach has two issues: 1) The computational tricks that leverage structure can also lead to computational instability; 2) The objective does not naturally decouple: Thus, if the dataset does not fit in memory, the model can be very computationally expensive to fit. This additionally means that the objective is not directly amenable to stochastic gradient-based optimization methods. To overcome these issues, we propose a simple, new framing of proportional hazards regression: This results in an objective function that is amenable to stochastic gradient descent. We show that this simple modification allows us to efficiently fit survival models with very large datasets. This also facilitates training complex, eg. neural-network-based, models with survival data.

下载PDF全文

下载文献需遵守相关版权规定

论文标题