使用频繁的序列模式在医疗保健中的非常大的事件数据上进行跟踪聚类

论文标题

使用频繁的序列模式在医疗保健中的非常大的事件数据上进行跟踪聚类

Trace Clustering on Very Large Event Data in Healthcare Using Frequent Sequence Patterns

论文作者

Lu, Xixi, Tabatabaei, Seyed Amin, Hoogendoorn, Mark, Reijers, Hajo A.

论文摘要

痕量聚类已越来越多地用于查找同质过程执行。但是，目前的技术在基于医疗保健数据的基础上寻找患者的有意义且有见地的聚类很难。由此产生的集群通常与医学专家的群集不符，也不保证有助于返回患者临床途径的有意义的过程图。毕竟，一家医院可能会进行数千个不同的活动，并每年产生数百万次活动。在本文中，我们通过使用医学专家提供的患者样本集提出了一种新型的微量聚类方法。更具体地说，我们在样本集上学习频繁的序列模式，根据模式对每个患者进行排名，并使用自动方法来确定相应的群集。我们分别找到每个群集，而频繁的序列模式用于发现过程图。该方法是在舞会中实施的，并使用从大学医学中心获得的大型数据集进行了评估。根据领域专家的说法，评估显示肾脏损伤分组为0.7，糖尿病为0.9，头部/颈部肿瘤为0.64，而过程图显示了这些组的临床途径的有意义的行为模式。

Trace clustering has increasingly been applied to find homogenous process executions. However, current techniques have difficulties in finding a meaningful and insightful clustering of patients on the basis of healthcare data. The resulting clusters are often not in line with those of medical experts, nor do the clusters guarantee to help return meaningful process maps of patients' clinical pathways. After all, a single hospital may conduct thousands of distinct activities and generate millions of events per year. In this paper, we propose a novel trace clustering approach by using sample sets of patients provided by medical experts. More specifically, we learn frequent sequence patterns on a sample set, rank each patient based on the patterns, and use an automated approach to determine the corresponding cluster. We find each cluster separately, while the frequent sequence patterns are used to discover a process map. The approach is implemented in ProM and evaluated using a large data set obtained from a university medical center. The evaluation shows F1-scores of 0.7 for grouping kidney injury, 0.9 for diabetes, and 0.64 for head/neck tumor, while the process maps show meaningful behavioral patterns of the clinical pathways of these groups, according to the domain experts.

下载PDF全文

下载文献需遵守相关版权规定

论文标题