表面视觉变压器：生物医学表面的灵活的基于注意力的建模

论文标题

表面视觉变压器：生物医学表面的灵活的基于注意力的建模

Surface Vision Transformers: Flexible Attention-Based Modelling of Biomedical Surfaces

论文作者

Dahan, Simon, Xu, Hao, Williams, Logan Z. J., Fawaz, Abdulah, Yang, Chunhui, Coalson, Timothy S., Williams, Michelle C., Newby, David E., Edwards, A. David, Glasser, Matthew F., Young, Alistair A., Rueckert, Daniel, Robinson, Emma C.

论文摘要

视觉变压器（VIT）在计算机视觉任务中的最新性能表明，实现远程自我注意的通用体系结构可以取代卷积神经网络的局部特征学习操作。在本文中，我们通过提出针对一般表面网格的修补机制来重新设计表面学习作为顺序学习问题，将VIT扩展到表面。然后，贴片序列由变压器编码器处理，并用于分类或回归。我们在各种生物医学表面域和任务上验证我们的方法：发展中连接组项目（DHCP）中的脑年龄预测，人类连接组项目（HCP）中的流体智能预测以及冠状动脉钙评分，使用来自苏格兰计算机的苏格兰计算机术（SCOT HEART）（SCOT HEART）（SCOT HEART）数据纳斯（Scot Heartt）数据纳斯（Scot Heartt）数据纳斯的表面进行了表面（SCOT HEART）数据量的效果，并在效果上进行了效果。结果表明，表面视觉变压器（SIT）表现出对脑年龄和流体智力预测的几何深度学习方法的一致性，并在钙评分分类上与临床实践中使用的标准指标实现了可比的性能。此外，对变压器注意图的分析提供了对驱动每项任务的功能的清晰和个性化的预测。代码可在github上找到：https：//github.com/metrics-lab/surface-vision-transformers

Recent state-of-the-art performances of Vision Transformers (ViT) in computer vision tasks demonstrate that a general-purpose architecture, which implements long-range self-attention, could replace the local feature learning operations of convolutional neural networks. In this paper, we extend ViTs to surfaces by reformulating the task of surface learning as a sequence-to-sequence learning problem, by proposing patching mechanisms for general surface meshes. Sequences of patches are then processed by a transformer encoder and used for classification or regression. We validate our method on a range of different biomedical surface domains and tasks: brain age prediction in the developing Human Connectome Project (dHCP), fluid intelligence prediction in the Human Connectome Project (HCP), and coronary artery calcium score classification using surfaces from the Scottish Computed Tomography of the Heart (SCOT-HEART) dataset, and investigate the impact of pretraining and data augmentation on model performance. Results suggest that Surface Vision Transformers (SiT) demonstrate consistent improvement over geometric deep learning methods for brain age and fluid intelligence prediction and achieve comparable performance on calcium score classification to standard metrics used in clinical practice. Furthermore, analysis of transformer attention maps offers clear and individualised predictions of the features driving each task. Code is available on Github: https://github.com/metrics-lab/surface-vision-transformers

下载PDF全文

下载文献需遵守相关版权规定

论文标题