论文标题
图像超分辨率的有效远程注意网络
Efficient Long-Range Attention Network for Image Super-resolution
论文作者
论文摘要
最近,基于变压器的方法通过利用自我注意(SA)进行特征提取,在包括图像超分辨率(SR)在内的各种视觉任务中都表现出了令人印象深刻的结果。但是,在大多数现有的基于变压器的模型中,SA的计算非常昂贵,而某些使用的操作对于SR任务可能是多余的。这限制了SA计算的范围,因此限制了SR性能。在这项工作中,我们为图像SR提出了一个有效的远程注意网络(ELAN)。具体而言,我们首先采用移位卷积(Shift-CONV)有效地提取图像局部结构信息,同时保持与1x1卷积相同的复杂性水平,然后提出一个小组的多尺度自我注意力(GMSA)模块,该模块在非宽大窗口大小上使用不同的窗口大小来利用长距离图像依赖性的非窗口大小。然后,通过简单地使用GMSA模块级联两个Shift-Conv来构建一个高效的远程注意块(ELAB),该模块通过使用共享注意机制进一步加速。没有铃铛和哨子,我们的Elan遵循了一个相当简单的设计,通过依次层叠Elabs。广泛的实验表明,Elan对基于变压器的SR模型获得了更好的结果,但复杂性明显较小。可以在https://github.com/xindongzhang/elan上找到源代码。
Recently, transformer-based methods have demonstrated impressive results in various vision tasks, including image super-resolution (SR), by exploiting the self-attention (SA) for feature extraction. However, the computation of SA in most existing transformer based models is very expensive, while some employed operations may be redundant for the SR task. This limits the range of SA computation and consequently the SR performance. In this work, we propose an efficient long-range attention network (ELAN) for image SR. Specifically, we first employ shift convolution (shift-conv) to effectively extract the image local structural information while maintaining the same level of complexity as 1x1 convolution, then propose a group-wise multi-scale self-attention (GMSA) module, which calculates SA on non-overlapped groups of features using different window sizes to exploit the long-range image dependency. A highly efficient long-range attention block (ELAB) is then built by simply cascading two shift-conv with a GMSA module, which is further accelerated by using a shared attention mechanism. Without bells and whistles, our ELAN follows a fairly simple design by sequentially cascading the ELABs. Extensive experiments demonstrate that ELAN obtains even better results against the transformer-based SR models but with significantly less complexity. The source code can be found at https://github.com/xindongzhang/ELAN.