我们需要多少个注释者？ - 关于观察者间变异性对自动有丝分裂人数评估可靠性的影响的研究

论文标题

我们需要多少个注释者？ - 关于观察者间变异性对自动有丝分裂人数评估可靠性的影响的研究

How Many Annotators Do We Need? -- A Study on the Influence of Inter-Observer Variability on the Reliability of Automatic Mitotic Figure Assessment

论文作者

Wilm, Frauke, Bertram, Christof A., Marzahl, Christian, Bartel, Alexander, Donovan, Taryn A., Assenmacher, Charles-Antoine, Becker, Kathrin, Bennett, Mark, Corner, Sarah, Cossic, Brieuc, Denk, Daniela, Dettwiler, Martina, Gonzalez, Beatriz Garcia, Gurtner, Corinne, Lehmbecker, Annika, Merz, Sophie, Plog, Stephanie, Schmidt, Anja, Smedley, Rebecca C., Tecilla, Marco, Thaiwong, Tuddow, Breininger, Katharina, Kiupel, Matti, Maier, Andreas, Klopfleisch, Robert, Aubreville, Marc

论文摘要

组织学切片中有丝分裂图的密度是许多肿瘤的预后相关特征。由于较高的病理学家变异性，基于深度学习的算法是改善肿瘤预后的有前途的解决方案。病理学家是数据库开发的黄金标准，但是，标记错误可能会阻碍准确的算法的发展。在目前的工作中，我们评估了多专家共识（n = 3、5、7、9、11）对算法性能的好处。虽然对单个数据库进行培训导致了高度可变的f $ _1 $得分，但在使用三个注释者的共识时，性能显着提高，并且更加一致。添加更多的注释只会导致较小的改进。我们得出的结论是，很少有病理学家的数据库和高标签准确性可能是高算法性能和时间投资之间的最佳折衷。

Density of mitotic figures in histologic sections is a prognostically relevant characteristic for many tumours. Due to high inter-pathologist variability, deep learning-based algorithms are a promising solution to improve tumour prognostication. Pathologists are the gold standard for database development, however, labelling errors may hamper development of accurate algorithms. In the present work we evaluated the benefit of multi-expert consensus (n = 3, 5, 7, 9, 11) on algorithmic performance. While training with individual databases resulted in highly variable F$_1$ scores, performance was notably increased and more consistent when using the consensus of three annotators. Adding more annotators only resulted in minor improvements. We conclude that databases by few pathologists and high label accuracy may be the best compromise between high algorithmic performance and time investment.

下载PDF全文

下载文献需遵守相关版权规定

论文标题