论文标题

一种用于在重症监护患者笔记中检测高秘密医疗状况的语料库

A Corpus for Detecting High-Context Medical Conditions in Intensive Care Patient Notes Focusing on Frequently Readmitted Patients

论文作者

Moseley, Edward T., Wu, Joy T., Welt, Jonathan, Foote, John, Tyler, Patrick D., Grant, David W., Carlson, Eric T., Gehrmann, Sebastian, Dernoncourt, Franck, Celi, Leo Anthony

论文摘要

对电子健康记录(EHR)次要分析(EHR)的关键步骤是确定正在研究的患者队列。尽管EHR包含旨在代表患者可能具有的状况和治疗的医疗计费代码,但其中大部分信息仅存在于患者笔记中。因此,从书面笔记中推断患者的状况和治疗方法是至关重要的。在本文中,我们介绍了一个用于患者表型的数据集,该数据集定义为根据患者的注释是否具有给定的医疗状况(也称为临床指示或表型)的识别。手动注释了大三级护理医院重症监护病房的护理进度笔记和出院摘要,以便在存在与治疗和重新住院风险有关的几种高含量表型。该数据集包含1102个排放摘要和1000个护理进度注释。至少有两个专家注释者(一名临床研究人员和一名常驻医师)注释了每个出院摘要和进度注释。注释的表型包括治疗不遵守,慢性疼痛,晚期/转移性癌症以及其他10种表型。该数据集可用于医学和计算机科学领域的学术和工业研究,尤其是在医学自然语言处理领域。

A crucial step within secondary analysis of electronic health records (EHRs) is to identify the patient cohort under investigation. While EHRs contain medical billing codes that aim to represent the conditions and treatments patients may have, much of the information is only present in the patient notes. Therefore, it is critical to develop robust algorithms to infer patients' conditions and treatments from their written notes. In this paper, we introduce a dataset for patient phenotyping, a task that is defined as the identification of whether a patient has a given medical condition (also referred to as clinical indication or phenotype) based on their patient note. Nursing Progress Notes and Discharge Summaries from the Intensive Care Unit of a large tertiary care hospital were manually annotated for the presence of several high-context phenotypes relevant to treatment and risk of re-hospitalization. This dataset contains 1102 Discharge Summaries and 1000 Nursing Progress Notes. Each Discharge Summary and Progress Note has been annotated by at least two expert human annotators (one clinical researcher and one resident physician). Annotated phenotypes include treatment non-adherence, chronic pain, advanced/metastatic cancer, as well as 10 other phenotypes. This dataset can be utilized for academic and industrial research in medicine and computer science, particularly within the field of medical natural language processing.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源