论文标题
语言模型是良好的病理学家:使用基于注意的序列减少和文本预言的变压器进行有效的WSI分类
Language models are good pathologists: using attention-based sequence reduction and text-pretrained transformers for efficient WSI classification
论文作者
论文摘要
在数字病理学中,整个幻灯片图像(WSI)分析通常被表述为多重实例学习(MIL)问题。尽管基于变压器的架构已用于WSI分类,但这些方法需要修改以使其适应此类图像数据的特定挑战。这些挑战包括深层变压器模型所需的内存和计算量,以处理长输入,例如数千个可以在$ \ times 10 $或$ \ times 20 $放大倍率下构成WSI的图像补丁。我们介绍了\ textit {seqshort},这是一种基于多头注意的序列缩短层,以固定和短尺寸的实例序列总结每个WSI,从而使我们可以减少长序列自我关注的计算成本,并包括在其他MIL方法中不可用的位置信息。此外,我们表明,当下游变压器体系结构已在大量文本数据上进行预训练时,WSI分类性能可以提高,并且仅微调少于其参数的0.1 \%。我们证明了我们的方法在淋巴结转移分类和癌症亚型分类任务中的有效性,而无需设计WSI特异性变压器,也不需要进行内域预训练,保持计算预算减少和较低的可训练参数。
In digital pathology, Whole Slide Image (WSI) analysis is usually formulated as a Multiple Instance Learning (MIL) problem. Although transformer-based architectures have been used for WSI classification, these methods require modifications to adapt them to specific challenges of this type of image data. Among these challenges is the amount of memory and compute required by deep transformer models to process long inputs, such as the thousands of image patches that can compose a WSI at $\times 10$ or $\times 20$ magnification. We introduce \textit{SeqShort}, a multi-head attention-based sequence shortening layer to summarize each WSI in a fixed- and short-sized sequence of instances, that allows us to reduce the computational costs of self-attention on long sequences, and to include positional information that is unavailable in other MIL approaches. Furthermore, we show that WSI classification performance can be improved when the downstream transformer architecture has been pre-trained on a large corpus of text data, and only fine-tuning less than 0.1\% of its parameters. We demonstrate the effectiveness of our method in lymph node metastases classification and cancer subtype classification tasks, without the need of designing a WSI-specific transformer nor doing in-domain pre-training, keeping a reduced compute budget and low number of trainable parameters.