论文标题

Bratsynthetic:使用马尔可夫链替换策略进行替代个人身份信息的文本识别

BRATsynthetic: Text De-identification using a Markov Chain Replacement Strategy for Surrogate Personal Identifying Information

论文作者

Osborne, John D., O'Leary, Tobias, Nadimpalli, Akhil, Aly., Salma M., Kennedy, Richard E.

论文摘要

目的:实施和评估个人健康识别信息(PHI)替代策略并量化其隐私保留福利。 材料和方法:我们实施和评估3种不同的“隐藏在纯粹的视觉中”(臀部)PHI替代策略,包括标准一致的替换策略,随机替代策略和新颖的基于马尔可夫模型的策略。我们使用一系列假负错误率(FNER)评估了这些策略对合成PHI分布和实际临床语料库的隐私保留益处。 结果:使用FNE在文件水平上从0.1%到5%的PHI泄漏范围从27.1%降低到0.1%(0.1%FNER),从94.2%到57.7%(5%FNER)利用Markov Chain策略与一致的策略相对于包含Alabama of Alabama Atmermem at Birmyham(Uab)的多样性策略的一致性策略。马尔可夫链替代策略还始终超过了模仿出院摘要和一系列合成临床PHI分布的一致和随机替换策略。讨论:我们证明,马尔可夫链替代生成策略大大减少了在一系列假定的phi fner范围内释放PHI的机会,并在Github上释放了我们的实施“ Bratsynthetic”。 结论:马尔可夫链替换策略允许相对于使用一致的臀部策略发布的语料库,以相同的风险水平释放更大的去识别语料库。

Objective: Implement and assess personal health identifying information (PHI) substitution strategies and quantify their privacy preserving benefits. Materials and Methods: We implement and assess 3 different `Hiding in Plain Sight` (HIPS) strategies for PHI replacement including a standard Consistent replacement strategy, a Random replacement strategy and a novel Markov model-based strategy. We evaluate the privacy preserving benefits of these strategies on a synthetic PHI distribution and real clinical corpora from 2 different institutions using a range of false negative error rates (FNER). Results: Using FNER ranging from 0.1% to 5% PHI leakage at the document level could be reduced from 27.1% to 0.1% (0.1% FNER) and from 94.2% to 57.7% (5% FNER) utilizing the Markov chain strategy versus the Consistent strategy on a corpus containing a diverse set of notes from the University of Alabama at Birmingham (UAB). The Markov chain substitution strategy also consistently outperformed the Consistent and Random substitution strategies in a MIMIC corpus of discharge summaries and on a range of synthetic clinical PHI distributions. Discussion: We demonstrate that a Markov chain surrogate generation strategy substantially reduces the chance of inadvertent PHI release across a range of assumed PHI FNER and release our implementation `BRATsynthetic` on Github. Conclusion: The Markov chain replacement strategy allows for the release of larger de-identified corpora at the same risk level relative to corpora released using a consistent HIPS strategy.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源