论文标题

通过等级自我监督学习将视觉变压器缩放到吉吉像素图像

Scaling Vision Transformers to Gigapixel Images via Hierarchical Self-Supervised Learning

论文作者

Chen, Richard J., Chen, Chengkuan, Li, Yicong, Chen, Tiffany Y., Trister, Andrew D., Krishnan, Rahul G., Mahmood, Faisal

论文摘要

视觉变压器(VIT)及其多尺度和层次变化已成功地捕获图像表示,但通常研究了它们的低分辨率图像图像(例如-256x256,384384)。对于计算病理学中的吉吉像素全磁带成像(WSI),WSIS可以在20倍放大倍率时高达150000x150000像素,并在各种分辨率上展示视觉图形的层次结构:从16x16的16x16图像中的空间图像捕获在细胞之间的空间模式,以在4096x4096图像中的相互作用,以范围的相互作用来构成范围的相互作用。我们介绍了一种称为层次图像金字塔变压器(HIPT)的新的VIT体系结构,该结构使用两个级别的自我监督学习来利用WSIS固有的自然层次结构来学习高分辨率图像表示。使用10,678 Gigapixel WSIS,408,218 4096x4096图像和104M 256x256图像,在33种癌症类型的癌症类型中预估计。我们对9个幻灯片级任务进行基准测试,并证明:1)赫普在层次上预处理以优于癌症亚型和生存预测的当前最新方法,2)自我监督的Vits能够模拟重要的感应性偏置,以模拟有关Tumorenornement tumoro Microenment of Themroyenment tumroenmorty tumroeenemonementy tumroenmoreenemonementy的层次结构。

Vision Transformers (ViTs) and their multi-scale and hierarchical variations have been successful at capturing image representations but their use has been generally studied for low-resolution images (e.g. - 256x256, 384384). For gigapixel whole-slide imaging (WSI) in computational pathology, WSIs can be as large as 150000x150000 pixels at 20X magnification and exhibit a hierarchical structure of visual tokens across varying resolutions: from 16x16 images capture spatial patterns among cells, to 4096x4096 images characterizing interactions within the tissue microenvironment. We introduce a new ViT architecture called the Hierarchical Image Pyramid Transformer (HIPT), which leverages the natural hierarchical structure inherent in WSIs using two levels of self-supervised learning to learn high-resolution image representations. HIPT is pretrained across 33 cancer types using 10,678 gigapixel WSIs, 408,218 4096x4096 images, and 104M 256x256 images. We benchmark HIPT representations on 9 slide-level tasks, and demonstrate that: 1) HIPT with hierarchical pretraining outperforms current state-of-the-art methods for cancer subtyping and survival prediction, 2) self-supervised ViTs are able to model important inductive biases about the hierarchical structure of phenotypes in the tumor microenvironment.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源