论文标题
用临床级表现准备病理人工智能的数据
Preparing data for pathological artificial intelligence with clinical-grade performance
论文作者
论文摘要
[目的]病理学对于疾病诊断是决定性的,但严重依赖经验丰富的病理学家。最近,病理人工智能(PAI)被认为可以提高诊断准确性和效率。但是,基于实验室中深度学习的高性能通常无法在诊所中复制。 [方法]由于数据制备对PAI很重要,因此本文审查了2017年1月至2022年2月发布的PubMed数据库中与PAI相关的研究,其中包括118项研究。对准备数据的方法进行了深入分析,包括获得病理组织的载玻片,清洁,筛查然后进行数字化。还讨论了专家审查,图像注释,模型培训和验证的数据集部门。我们进一步讨论了为什么PAI在临床实践中不可再现的原因,并展示了一些有效的方法来改善PAI的临床表现。 [结果] PAI的鲁棒性取决于代表性疾病幻灯片的随机收集,包括严格的质量控制和筛选,数字差异的纠正,合理的注释和数据量。数字病理是临床级PAI的基础,基于整个幻灯片图像(WSI)的数据标准化技术和弱监督的学习方法是克服性能繁殖障碍的有效方法。 [结论]代表性数据,多中心的标签和一致性的量是性能再现的关键。基于WSI的临床诊断,数据标准化和技术弱监督学习的数字病理有望建立临床级PAI。关键词:病理人工智能;数据准备;临床级;深度学习
[Purpose] The pathology is decisive for disease diagnosis, but relies heavily on the experienced pathologists. Recently, pathological artificial intelligence (PAI) is thought to improve diagnostic accuracy and efficiency. However, the high performance of PAI based on deep learning in the laboratory generally cannot be reproduced in the clinic. [Methods] Because the data preparation is important for PAI, the paper has reviewed PAI-related studies in the PubMed database published from January 2017 to February 2022, and 118 studies were included. The in-depth analysis of methods for preparing data is performed, including obtaining slides of pathological tissue, cleaning, screening, and then digitizing. Expert review, image annotation, dataset division for model training and validation are also discussed. We further discuss the reasons why the high performance of PAI is not reproducible in the clinical practices and show some effective ways to improve clinical performances of PAI. [Results] The robustness of PAI depend on randomized collection of representative disease slides, including rigorous quality control and screening, correction of digital discrepancies, reasonable annotation, and the amount of data. The digital pathology is fundamental of clinical-grade PAI, and the techniques of data standardization and weakly supervised learning methods based on whole slide image (WSI) are effective ways to overcome obstacles of performance reproduction. [Conclusion] The representative data, the amount of labeling and consistency from multi-centers is the key to performance reproduction. The digital pathology for clinical diagnosis, data standardization and technique of WSI-based weakly supervised learning hopefully build clinical-grade PAI. Keywords: pathological artificial intelligence; data preparation; clinical-grade; deep learning