重新思考SoftMax对轻质非本地神经网络的功效

论文标题

重新思考SoftMax对轻质非本地神经网络的功效

Rethinking Efficacy of Softmax for Lightweight Non-Local Neural Networks

论文作者

Cho, Yooshin, Kim, Youngsoo, Cho, Hanbyel, Ahn, Jaesung, Hong, Hyeong Gwon, Kim, Junmo

论文摘要

非本地（NL）块是一个流行的模块，它展示了建模全局上下文的功能。但是，NL块通常具有沉重的计算和记忆成本，因此将块应用于高分辨率特征地图是不切实际的。在本文中，为了研究NL块的功效，我们经验分析了输入特征向量的大小和方向是否正确影响向量之间的注意力。结果表明，SoftMax操作的效率低下，该操作通常用于使NL块的注意力图归一化。通过软磁性操作归一化的注意力图高度依赖于关键向量的幅度，并且如果删除幅度信息，则性能将退化。通过用缩放系数替换SoftMax操作，我们证明了CIFAR-10，CIFAR-100和TININE-IMAGENET的性能提高。此外，我们的方法显示了嵌入通道减少和嵌入重量初始化的鲁棒性。值得注意的是，我们的方法在没有额外的计算成本的情况下可以使多头注意力可用。

Non-local (NL) block is a popular module that demonstrates the capability to model global contexts. However, NL block generally has heavy computation and memory costs, so it is impractical to apply the block to high-resolution feature maps. In this paper, to investigate the efficacy of NL block, we empirically analyze if the magnitude and direction of input feature vectors properly affect the attention between vectors. The results show the inefficacy of softmax operation which is generally used to normalize the attention map of the NL block. Attention maps normalized with softmax operation highly rely upon magnitude of key vectors, and performance is degenerated if the magnitude information is removed. By replacing softmax operation with the scaling factor, we demonstrate improved performance on CIFAR-10, CIFAR-100, and Tiny-ImageNet. In Addition, our method shows robustness to embedding channel reduction and embedding weight initialization. Notably, our method makes multi-head attention employable without additional computational cost.

下载PDF全文

下载文献需遵守相关版权规定

论文标题