监督机器学习算法，用于检测报告的发现与乳腺X线摄影报告的结论之间的一致性

论文标题

监督机器学习算法，用于检测报告的发现与乳腺X线摄影报告的结论之间的一致性

Supervised Machine Learning Algorithm for Detecting Consistency between Reported Findings and the Conclusions of Mammography Reports

论文作者

Berdichevsky, Alexander, Peleg, Mor, Rubin, Daniel L.

论文摘要

客观的。乳腺摄影报告记录了患者病情的诊断。但是，许多报告包含非标准术语（非BI-RADS描述符）和不完整的陈述，这可能会导致结论，而这些结论不受报道的发现并没有得到很好的支持。我们的目的是开发一种工具来检测此类差异，通过将报告的结论与基于报告的放射学发现预期的结论进行比较。材料和方法。来自一家学术医院的一个被识别的数据集，其中包含258个乳房X线摄影报告，并补充了在网络上发现的120个报告，用于培训和评估。咒语检查和术语归一化用于明确确定报告的BI-RADS描述符。最终的数据分为七个分类器，这些分类器根据其发现部分将乳房X线摄影报告分类为七个BI-RADS最终评估类别。最后，报告了报告对每个BIADS类别的报告的语义相似性得分。结果。我们的术语归一化算法在乳房X线摄影报告中正确识别了97％的BI-RADS描述符。我们的系统提供了76％的精度和83％的召回，以根据BI-RADS最终评估类别正确分类报告。讨论。我们方法的强度依赖于在摘要阶段，在考虑复杂数据表示形式的语义相似性以及分为所有七个BI-RADS类别的语义相似性上，在摘要阶段提供了高度的重要性。结论。 BI-RADS的描述符和预期的最终评估类别可以通过我们的方法以相当良好的准确性自动检测到，这可以用来使用户意识到他们所报告的发现与他们的结论不太吻合。

Objective. Mammography reports document the diagnosis of patients' conditions. However, many reports contain non-standard terms (non-BI-RADS descriptors) and incomplete statements, which can lead to conclusions that are not well-supported by the reported findings. Our aim was to develop a tool to detect such discrepancies by comparing the reported conclusions to those that would be expected based on the reported radiology findings. Materials and Methods. A deidentified data set from an academic hospital containing 258 mammography reports supplemented by 120 reports found on the web was used for training and evaluation. Spell checking and term normalization was used to unambiguously determine the reported BI-RADS descriptors. The resulting data were input into seven classifiers that classify mammography reports, based on their Findings sections, into seven BI-RADS final assessment categories. Finally, the semantic similarity score of a report to each BI-RADS category is reported. Results. Our term normalization algorithm correctly identified 97% of the BI-RADS descriptors in mammography reports. Our system provided 76% precision and 83% recall in correctly classifying the reports according to BI-RADS final assessment category. Discussion. The strength of our approach relies on providing high importance to BI-RADS terms in the summarization phase, on the semantic similarity that considers the complex data representation, and on the classification into all seven BI-RADs categories. Conclusion. BI-RADS descriptors and expected final assessment categories could be automatically detected by our approach with fairly good accuracy, which could be used to make users aware that their reported findings do not match well with their conclusion.

下载PDF全文

下载文献需遵守相关版权规定

论文标题