论文标题
患者结局预测的文本数据增强
Textual Data Augmentation for Patient Outcomes Prediction
论文作者
论文摘要
深度学习模型在各种医疗保健应用中都表现出了卓越的表现。但是,这些深层模型的主要局限性通常是由于该领域的私人和敏感性缺乏高质量的培训数据。在这项研究中,我们提出了一种新型的文本数据增强方法,以在患者的电子健康记录(EHR)中生成人工临床注意事项,可用作患者结果预测的其他培训数据。从本质上讲,我们微调生成语言模型GPT-2,以通过原始培训数据合成标记的文本。更具体地说,我们提出了一个教师学生的框架,在该框架中,我们首先在原始数据上预先培训教师模型,然后在教师的指导下培训学生模型的GPT agement数据。我们评估了最常见的患者结果的方法,即30天的再入院率。实验结果表明,深层模型可以通过增强数据改善其预测性能,表明所提出的体系结构的有效性。
Deep learning models have demonstrated superior performance in various healthcare applications. However, the major limitation of these deep models is usually the lack of high-quality training data due to the private and sensitive nature of this field. In this study, we propose a novel textual data augmentation method to generate artificial clinical notes in patients' Electronic Health Records (EHRs) that can be used as additional training data for patient outcomes prediction. Essentially, we fine-tune the generative language model GPT-2 to synthesize labeled text with the original training data. More specifically, We propose a teacher-student framework where we first pre-train a teacher model on the original data, and then train a student model on the GPT-augmented data under the guidance of the teacher. We evaluate our method on the most common patient outcome, i.e., the 30-day readmission rate. The experimental results show that deep models can improve their predictive performance with the augmented data, indicating the effectiveness of the proposed architecture.