视觉变压器和数字病理中CNN之间的比较研究

论文标题

视觉变压器和数字病理中CNN之间的比较研究

A comparative study between vision transformers and CNNs in digital pathology

论文作者

Deininger, Luca, Stimpel, Bernhard, Yuce, Anil, Abbasi-Sureshjani, Samaneh, Schönenberger, Simon, Ocampo, Paolo, Korski, Konstanty, Gaire, Fabien

论文摘要

最近，当鉴定足够数量的数据鉴定时，视力变压器能够胜过卷积神经网络。与卷积神经网络相比，视觉变压器具有较弱的电感偏差，因此可以更灵活地检测功能。由于其有前途的特征检测，这项工作探索了视觉变压器在数字病理中的肿瘤检测到四种组织类型的整个幻灯片图像以及组织类型鉴定。我们将视觉变压器DEIT微小的斑块分类性能与最先进的卷积神经网络RESNET18进行了比较。由于带注释的整个幻灯片图像的稀疏可用性，我们进一步比较了使用最先进的自我监督方法在大量未标记的全板图像上预估计的模型。结果表明，对于四种组织类型中的三种，视觉变压器的性能略高于RESNET18，用于肿瘤检测，而RESNET18在其余任务中的表现略好一些。两个模型在幻灯片水平上的汇总预测均相关，表明这些模型捕获了相似的成像特征。总的来说，视觉变压器模型与RESNET18同在，同时需要更多的训练。为了超越卷积神经网络的性能，视觉变形金刚可能需要更具挑战性的任务，以使其受益于其弱的电感偏见。

Recently, vision transformers were shown to be capable of outperforming convolutional neural networks when pretrained on sufficient amounts of data. In comparison to convolutional neural networks, vision transformers have a weaker inductive bias and therefore allow a more flexible feature detection. Due to their promising feature detection, this work explores vision transformers for tumor detection in digital pathology whole slide images in four tissue types, and for tissue type identification. We compared the patch-wise classification performance of the vision transformer DeiT-Tiny to the state-of-the-art convolutional neural network ResNet18. Due to the sparse availability of annotated whole slide images, we further compared both models pretrained on large amounts of unlabeled whole-slide images using state-of-the-art self-supervised approaches. The results show that the vision transformer performed slightly better than the ResNet18 for three of four tissue types for tumor detection while the ResNet18 performed slightly better for the remaining tasks. The aggregated predictions of both models on slide level were correlated, indicating that the models captured similar imaging features. All together, the vision transformer models performed on par with the ResNet18 while requiring more effort to train. In order to surpass the performance of convolutional neural networks, vision transformers might require more challenging tasks to benefit from their weak inductive bias.

下载PDF全文

下载文献需遵守相关版权规定

论文标题