提示：有条件的电子医疗保健记录发电及时学习

论文标题

提示：有条件的电子医疗保健记录发电及时学习

PromptEHR: Conditional Electronic Healthcare Records Generation with Prompt Learning

论文作者

Wang, Zifeng, Sun, Jimeng

论文摘要

由于隐私问题，访问纵向多模式电子医疗记录（EHR）是具有挑战性的，这阻碍了ML用于医疗保健应用。合成EHRS生成绕过了共享敏感的真实患者记录的需求。但是，现有方法通过无条件产生或通过纵向推断产生单模式EHR，该方法的灵活性低不足，并且使EHR不切实际。在这项工作中，我们建议通过语言模型（LMS）制定EHRS生成作为文本翻译任务，这足以在发电期间高度灵活的事件插补。我们还设计及时学习以控制数值和分类人口特征来控制的生成。我们通过两种困惑度度量评估合成EHRS质量，这些措施考虑了它们的纵向模式（纵向插补困惑，LPL）和连接跨模态（跨模式渗出的综合性，MPL）。此外，我们利用两个对手：成员资格和属性推理攻击进行隐私评估。模拟III数据的实验证明了我们对现实的EHRS生成的优势（与最佳基线相比，LPL的降低53.1 \％降低，MPL的平均降低为45.3 \％）。软件可在https://github.com/ryanwangzf/promptehr上找到。

Accessing longitudinal multimodal Electronic Healthcare Records (EHRs) is challenging due to privacy concerns, which hinders the use of ML for healthcare applications. Synthetic EHRs generation bypasses the need to share sensitive real patient records. However, existing methods generate single-modal EHRs by unconditional generation or by longitudinal inference, which falls short of low flexibility and makes unrealistic EHRs. In this work, we propose to formulate EHRs generation as a text-to-text translation task by language models (LMs), which suffices to highly flexible event imputation during generation. We also design prompt learning to control the generation conditioned by numerical and categorical demographic features. We evaluate synthetic EHRs quality by two perplexity measures accounting for their longitudinal pattern (longitudinal imputation perplexity, lpl) and the connections cross modalities (cross-modality imputation perplexity, mpl). Moreover, we utilize two adversaries: membership and attribute inference attacks for privacy-preserving evaluation. Experiments on MIMIC-III data demonstrate the superiority of our methods on realistic EHRs generation (53.1\% decrease of lpl and 45.3\% decrease of mpl on average compared to the best baselines) with low privacy risks. Software is available at https://github.com/RyanWangZf/PromptEHR.

下载PDF全文

下载文献需遵守相关版权规定

论文标题