各种主链神经网络的通用共享注意力机制

论文标题

各种主链神经网络的通用共享注意力机制

A Generic Shared Attention Mechanism for Various Backbone Neural Networks

论文作者

Huang, Zhongzhan, Liang, Senwei, Liang, Mingfu, Lin, Liang

论文摘要

自我注意力的机制已成为改善各种主链神经网络的性能的关键组成部分。但是，当前的主流方法单独将新设计的自我发场模块（SAM）纳入网络的每一层，而无需完全利用其参数的潜力。随着网络深度的增加，这会导致次优性能和参数消耗的增加。为了改善这种范式，在本文中，我们首先提出了一种违反直觉但固有的现象：SAMS倾向于在不同层上产生密切相关的注意力图，平均Pearson相关系数为0.85。受到这种固有的观察的启发，我们提出了密集的和无限的关注（DIA），该注意直接在各个层之间共享SAM，并采用了较长的短期记忆模块来校准和桥接不同层的高度相关的注意图，从而提高了Sams的参数利用效率。 DIA的设计也与神经网络的动态系统观点一致。通过广泛的实验，我们证明了我们的简单而有效的DIA可以始终如一地增强各种网络式骨干，包括Resnet，Transformer和UNET，跨越诸如图像分类，对象检测和使用扩散模型产生图像生成等任务。

The self-attention mechanism has emerged as a critical component for improving the performance of various backbone neural networks. However, current mainstream approaches individually incorporate newly designed self-attention modules (SAMs) into each layer of the network for granted without fully exploiting their parameters' potential. This leads to suboptimal performance and increased parameter consumption as the network depth increases. To improve this paradigm, in this paper, we first present a counterintuitive but inherent phenomenon: SAMs tend to produce strongly correlated attention maps across different layers, with an average Pearson correlation coefficient of up to 0.85. Inspired by this inherent observation, we propose Dense-and-Implicit Attention (DIA), which directly shares SAMs across layers and employs a long short-term memory module to calibrate and bridge the highly correlated attention maps of different layers, thus improving the parameter utilization efficiency of SAMs. This design of DIA is also consistent with the neural network's dynamical system perspective. Through extensive experiments, we demonstrate that our simple yet effective DIA can consistently enhance various network backbones, including ResNet, Transformer, and UNet, across tasks such as image classification, object detection, and image generation using diffusion models.

下载PDF全文

下载文献需遵守相关版权规定

论文标题