肝组织病理学机器学习模型的自我训练：临床转移下的概括

论文标题

肝组织病理学机器学习模型的自我训练：临床转移下的概括

Self-training of Machine Learning Models for Liver Histopathology: Generalization under Clinical Shifts

论文作者

Li, Jin, Rajan, Deepta, Shah, Chintan, Juyal, Dinkar, Chakraborty, Shreya, Akiti, Chandan, Kos, Filip, Iyer, Janani, Sampat, Anand, Behrooz, Ali

论文摘要

组织病理学图像是千兆像素大小的，并在不同的分辨率下包含特征和信息。收集组织病理学的注释需要高度专业的病理学家，使其昂贵且耗时。自我训练可以通过从标记和未标记的数据中学习来减少注释约束，从而减少病理学家所需的注释数量。我们使用注释有限的临床组织病理学数据集研究了非酒精性脂肪性肝炎（NASH）的教师自我训练系统（NASH）的设计。我们在临床数据变化下评估了分布式和分布式测试数据的模型。我们证明，通过自我训练，最好的学生模型在统计上以$ 3 \％$的绝对差异在宏观F1分数上优于老师。最好的学生模型还可以接触到完全有监督的模型的表现，该模型的注释是两倍。

Histopathology images are gigapixel-sized and include features and information at different resolutions. Collecting annotations in histopathology requires highly specialized pathologists, making it expensive and time-consuming. Self-training can alleviate annotation constraints by learning from both labeled and unlabeled data, reducing the amount of annotations required from pathologists. We study the design of teacher-student self-training systems for Non-alcoholic Steatohepatitis (NASH) using clinical histopathology datasets with limited annotations. We evaluate the models on in-distribution and out-of-distribution test data under clinical data shifts. We demonstrate that through self-training, the best student model statistically outperforms the teacher with a $3\%$ absolute difference on the macro F1 score. The best student model also approaches the performance of a fully supervised model trained with twice as many annotations.

下载PDF全文

下载文献需遵守相关版权规定

论文标题