评估纵向电子保健数据的机器学习进度

论文标题

评估纵向电子保健数据的机器学习进度

Evaluating Progress on Machine Learning for Longitudinal Electronic Healthcare Data

论文作者

Bellamy, David, Celi, Leo, Beam, Andrew L.

论文摘要

基于众所周知的Imagenet数据集的大规模视觉识别挑战促进了计算机视觉进展的巨大范围。基准任务以同样令人印象深刻的速度推动了机器学习的其他子场，但是在医疗保健中，它主要是图像处理任务，例如皮肤病学和放射学，这些任务经历了类似的基准驱动的进度。在本研究中，我们对结构化数据的医学机器学习中的基准进行了全面综述，并根据重症监护医学信息MART（MIMIC-III）识别一个基准，该数据允许对预测性能进行首次直接比较，从而对四个临床预测任务进行进度评估：死亡率，停留时间，停留时间，表型和患者的长度，患者的长度，表现型和患者载体。我们发现，尽管社区参与很大，但在这些任务上已经有3年的时间取得了很少的有意义的进步。通过我们的荟萃分析，我们发现深度复发模型的性能仅优于某些任务的逻辑回归。最后，我们以这些结果的综合，可能的解释以及医疗机器学习中未来基准的理想素质清单。

The Large Scale Visual Recognition Challenge based on the well-known Imagenet dataset catalyzed an intense flurry of progress in computer vision. Benchmark tasks have propelled other sub-fields of machine learning forward at an equally impressive pace, but in healthcare it has primarily been image processing tasks, such as in dermatology and radiology, that have experienced similar benchmark-driven progress. In the present study, we performed a comprehensive review of benchmarks in medical machine learning for structured data, identifying one based on the Medical Information Mart for Intensive Care (MIMIC-III) that allows the first direct comparison of predictive performance and thus the evaluation of progress on four clinical prediction tasks: mortality, length of stay, phenotyping, and patient decompensation. We find that little meaningful progress has been made over a 3 year period on these tasks, despite significant community engagement. Through our meta-analysis, we find that the performance of deep recurrent models is only superior to logistic regression on certain tasks. We conclude with a synthesis of these results, possible explanations, and a list of desirable qualities for future benchmarks in medical machine learning.

下载PDF全文

下载文献需遵守相关版权规定

论文标题