NLP沙箱：有效的模型到数据系统，可实现联合和公正的临床NLP模型评估

论文标题

NLP沙箱：有效的模型到数据系统，可实现联合和公正的临床NLP模型评估

The NLP Sandbox: an efficient model-to-data system to enable federated and unbiased evaluation of clinical NLP models

论文作者

Yan, Yao, Yu, Thomas, Muenzen, Kathleen, Liu, Sijia, Boyle, Connor, Koslowski, George, Zheng, Jiaxin, Dobbins, Nicholas, Essien, Clement, Liu, Hongfang, Omberg, Larsson, Yestigen, Meliha, Taylor, Bradley, Eddy, James A, Guinney, Justin, Mooney, Sean, Schaffter, Thomas

论文摘要

目的是对临床文本去识别的自然语言处理（NLP）模型的评估取决于临床注释的可用性，临床注释通常由于隐私问题而受到限制。 NLP沙箱是一种通过采用联合模型到数据的方法来减轻NLP模型缺乏数据和评估框架的方法。这可以实现公正的联合模型评估，而无需共享多个机构的敏感数据。材料和方法我们利用Synapse协作框架，容器化软件和OpenAPI Generator来构建NLP沙盒（NLPSANDBOX.IO）。我们使用来自三个机构的数据评估了两个最先进的NLP DE识别注释模型，Philter和Neuroner。我们使用来自外部验证站点的数据进一步验证了模型性能。结果我们通过去识别临床模型评估证明了NLP沙箱的有用性。外部开发人员能够将其模型纳入NLP沙箱模板中，并提供用户体验反馈。讨论我们证明了使用NLP沙箱对临床文本去识别模型进行多站点评估的可行性，而无需共享数据。标准化模型和数据模式可实现平稳的模型传输和实现。为了概括NLP沙箱，数据所有者和模型开发人员需要进行工作，以开发合适和标准化的模式，并调整其数据或模型以适合模式。结论NLP沙箱降低了利用临床数据进行NLP模型评估的障碍，并促进了NLP模型联合，多站点，公正的评估。

Objective The evaluation of natural language processing (NLP) models for clinical text de-identification relies on the availability of clinical notes, which is often restricted due to privacy concerns. The NLP Sandbox is an approach for alleviating the lack of data and evaluation frameworks for NLP models by adopting a federated, model-to-data approach. This enables unbiased federated model evaluation without the need for sharing sensitive data from multiple institutions. Materials and Methods We leveraged the Synapse collaborative framework, containerization software, and OpenAPI generator to build the NLP Sandbox (nlpsandbox.io). We evaluated two state-of-the-art NLP de-identification focused annotation models, Philter and NeuroNER, using data from three institutions. We further validated model performance using data from an external validation site. Results We demonstrated the usefulness of the NLP Sandbox through de-identification clinical model evaluation. The external developer was able to incorporate their model into the NLP Sandbox template and provide user experience feedback. Discussion We demonstrated the feasibility of using the NLP Sandbox to conduct a multi-site evaluation of clinical text de-identification models without the sharing of data. Standardized model and data schemas enable smooth model transfer and implementation. To generalize the NLP Sandbox, work is required on the part of data owners and model developers to develop suitable and standardized schemas and to adapt their data or model to fit the schemas. Conclusions The NLP Sandbox lowers the barrier to utilizing clinical data for NLP model evaluation and facilitates federated, multi-site, unbiased evaluation of NLP models.

下载PDF全文

下载文献需遵守相关版权规定

论文标题