MRL：学会与注意力和卷积混合

论文标题

MRL：学会与注意力和卷积混合

MRL: Learning to Mix with Attention and Convolutions

论文作者

Mohta, Shlok, Suganuma, Hisahiro, Tanaka, Yoshiki

论文摘要

在本文中，我们为视觉域提出了一个新的神经体系结构块，该区域称为区域和本地（MRL），以有效有效地混合提供的输入特征。我们将输入特征混合任务分叉为区域和本地规模的混合。为了实现有效的混合，我们利用自我注意力提供的域范围内的接收场，用于局部尺度混合的区域尺度混合和卷积内核。更具体地说，我们提出的方法将与定义区域内的本地特征相关的区域特征混合在一起，其次是局部规模的特征混合，以区域特征增强。实验表明，这种自我注意力和卷积的杂交带来了能力提高，概括（右感应偏见）和效率。在类似的网络设置下，MRL的表现要优于其分类，对象检测和分割任务中的对应物。我们还表明，我们的基于MRL的网络体系结构可实现H＆E组织学数据集的最新性能。我们在Kumar，ConSEP和CPM-17数据集中获得了0.843、0.855和0.892的骰子，同时通过合并了MRL框架提供的多功能性，通过合并诸如小组卷积之类的层来改善数据集特异性通用化。

In this paper, we present a new neural architectural block for the vision domain, named Mixing Regionally and Locally (MRL), developed with the aim of effectively and efficiently mixing the provided input features. We bifurcate the input feature mixing task as mixing at a regional and local scale. To achieve an efficient mix, we exploit the domain-wide receptive field provided by self-attention for regional-scale mixing and convolutional kernels restricted to local scale for local-scale mixing. More specifically, our proposed method mixes regional features associated with local features within a defined region, followed by a local-scale features mix augmented by regional features. Experiments show that this hybridization of self-attention and convolution brings improved capacity, generalization (right inductive bias), and efficiency. Under similar network settings, MRL outperforms or is at par with its counterparts in classification, object detection, and segmentation tasks. We also show that our MRL-based network architecture achieves state-of-the-art performance for H&E histology datasets. We achieved DICE of 0.843, 0.855, and 0.892 for Kumar, CoNSep, and CPM-17 datasets, respectively, while highlighting the versatility offered by the MRL framework by incorporating layers like group convolutions to improve dataset-specific generalization.

下载PDF全文

下载文献需遵守相关版权规定

论文标题