GENHPF：多任务多源学习的一般医疗保健预测框架

论文标题

GENHPF：多任务多源学习的一般医疗保健预测框架

GenHPF: General Healthcare Predictive Framework with Multi-task Multi-source Learning

论文作者

Hur, Kyunghoon, Oh, Jungwoo, Kim, Junu, Kim, Jiyoun, Lee, Min Jae, Cho, Eunbyeol, Moon, Seong-Eun, Kim, Young-Hak, Atallah, Louis, Choi, Edward

论文摘要

尽管在医疗保健预测模型的发展方面取得了显着的进展，但大规模应用这些算法仍然具有挑战性。根据一组医疗记录中可用的特定数据格式培训的算法往往不会概括到数据字段可能不同的其他任务或数据库中。为了应对这一挑战，我们提出了一般医疗保健预测框架（GENHPF），该框架适用于任何具有最小预处理的EHR，用于多个预测任务。 GenHPF通过将EHR转换为层次的文本表示，同时结合尽可能多的功能，从而解决了医疗法规和模式中的异质性。为了评估GENHPF的疗效，我们对单源和多源设置进行了多任务学习实验，对三个公开可用的EHR数据集，具有不同的模式，用于12个临床上有意义的预测任务。我们的框架大大优于基准模型，这些模型在多源学习中利用域知识，在合并学习中将平均AUROC提高1.2％P，在转移学习中将2.6％的P提高，同时在对单个EHR数据集进行培训时也会显示出可比的结果。此外，我们证明，与GENHPF结合使用，使用多源数据集进行自我监管的预处理是有效的，与没有预处理的模型相比，与模型相比，p AUROC的提高了0.6％。通过消除对预处理和功能工程的需求，我们相信这项工作为多任务和多源学习提供了坚实的框架，可以利用这些框架来加快医疗保健中预测算法的缩放和使用。

Despite the remarkable progress in the development of predictive models for healthcare, applying these algorithms on a large scale has been challenging. Algorithms trained on a particular task, based on specific data formats available in a set of medical records, tend to not generalize well to other tasks or databases in which the data fields may differ. To address this challenge, we propose General Healthcare Predictive Framework (GenHPF), which is applicable to any EHR with minimal preprocessing for multiple prediction tasks. GenHPF resolves heterogeneity in medical codes and schemas by converting EHRs into a hierarchical textual representation while incorporating as many features as possible. To evaluate the efficacy of GenHPF, we conduct multi-task learning experiments with single-source and multi-source settings, on three publicly available EHR datasets with different schemas for 12 clinically meaningful prediction tasks. Our framework significantly outperforms baseline models that utilize domain knowledge in multi-source learning, improving average AUROC by 1.2%P in pooled learning and 2.6%P in transfer learning while also showing comparable results when trained on a single EHR dataset. Furthermore, we demonstrate that self-supervised pretraining using multi-source datasets is effective when combined with GenHPF, resulting in a 0.6%P AUROC improvement compared to models without pretraining. By eliminating the need for preprocessing and feature engineering, we believe that this work offers a solid framework for multi-task and multi-source learning that can be leveraged to speed up the scaling and usage of predictive algorithms in healthcare.

下载PDF全文

下载文献需遵守相关版权规定

论文标题