视觉变压器的表面分析

论文标题

视觉变压器的表面分析

Surface Analysis with Vision Transformers

论文作者

Dahan, Simon, Williams, Logan Z. J., Fawaz, Abdulah, Rueckert, Daniel, Robinson, Emma C.

论文摘要

将卷积神经网络（CNN）扩展到非欧几里得几何形状已导致多个用于研究歧管的框架。这些方法中的许多方法都显示出设计局限性，导致远程关联的建模较差，因为对不规则表面的卷积的概括是不平凡的。视觉变压器（VIT）的最新性能表明，实现自我注意力的通用体系结构可以取代CNN的本地特征学习操作。通过在计算机视觉中进行注意力模型的成功激发，我们通过重新设计表面学习任务作为顺序到序列问题，并提出了表面网格的修补机制，将VIT扩展到表面。我们验证了所提出的表面视觉变压器（SIT）在发展中的人类连接项目（DHCP）数据集中的两个大脑年龄预测任务上的性能，并研究了预训练对模型性能的影响。实验表明，SIT的表现胜过许多表面CNN，同时表明了一些一般转化不变性的证据。可在https://github.com/metrics-lab/surface-vision-transformers上找到代码

The extension of convolutional neural networks (CNNs) to non-Euclidean geometries has led to multiple frameworks for studying manifolds. Many of those methods have shown design limitations resulting in poor modelling of long-range associations, as the generalisation of convolutions to irregular surfaces is non-trivial. Recent state-of-the-art performance of Vision Transformers (ViTs) demonstrates that a general-purpose architecture, which implements self-attention, could replace the local feature learning operations of CNNs. Motivated by the success of attention-modelling in computer vision, we extend ViTs to surfaces by reformulating the task of surface learning as a sequence-to-sequence problem and propose a patching mechanism for surface meshes. We validate the performance of the proposed Surface Vision Transformer (SiT) on two brain age prediction tasks in the developing Human Connectome Project (dHCP) dataset and investigate the impact of pre-training on model performance. Experiments show that the SiT outperforms many surface CNNs, while indicating some evidence of general transformation invariance. Code available at https://github.com/metrics-lab/surface-vision-transformers

下载PDF全文

下载文献需遵守相关版权规定

论文标题