论文标题
使用性能结果指标和人口统计数据中多发性硬化症的残疾预测
Disability prediction in multiple sclerosis using performance outcome measures and demographic data
论文作者
论文摘要
多发性硬化症的机器学习文献主要集中在使用神经影像学数据,例如磁共振成像和临床实验室测试疾病鉴定。但是,研究表明,这些方式与疾病活动(例如症状或疾病进展)不一致。此外,从这些方式中收集数据的成本很高,导致评估稀缺。在这项工作中,我们将多维,负担得起的,物理和智能手机的性能结果指标(POM)与人口统计数据结合使用来预测多发性硬化症疾病进展。我们在两个数据集上进行了严格的基准测试练习,并在13个临床可行的预测终点和6个机器学习模型中呈现结果。据我们所知,我们的结果是第一个表明在临床试验和智能手机基础研究的背景下,使用POMS和人口统计数据可以通过使用两个数据集来预测疾病进展。此外,我们研究了我们的模型,以通过特征消融研究了解不同的POM和人口统计学对模型性能的影响。我们还表明,不同人口亚组(基于年龄和性别)的模型性能相似。为了启用这项工作,我们开发了一个可重复使用的预处理和机器学习框架,该框架可以通过不同的MS数据集进行更快的实验。
Literature on machine learning for multiple sclerosis has primarily focused on the use of neuroimaging data such as magnetic resonance imaging and clinical laboratory tests for disease identification. However, studies have shown that these modalities are not consistent with disease activity such as symptoms or disease progression. Furthermore, the cost of collecting data from these modalities is high, leading to scarce evaluations. In this work, we used multi-dimensional, affordable, physical and smartphone-based performance outcome measures (POM) in conjunction with demographic data to predict multiple sclerosis disease progression. We performed a rigorous benchmarking exercise on two datasets and present results across 13 clinically actionable prediction endpoints and 6 machine learning models. To the best of our knowledge, our results are the first to show that it is possible to predict disease progression using POMs and demographic data in the context of both clinical trials and smartphone-base studies by using two datasets. Moreover, we investigate our models to understand the impact of different POMs and demographics on model performance through feature ablation studies. We also show that model performance is similar across different demographic subgroups (based on age and sex). To enable this work, we developed an end-to-end reusable pre-processing and machine learning framework which allows quicker experimentation over disparate MS datasets.