论文标题

FSANET:语义分割的频率自我注意

FsaNet: Frequency Self-attention for Semantic Segmentation

论文作者

Zhang, Fengyu, Panahi, Ashkan, Gao, Guangjun

论文摘要

考虑到图像的光谱特性,我们提出了一种具有高度降低计算复杂性的新自我发项机制,最高为线性速率。为了更好地保留边缘,同时促进对象内的相似性,我们提出了在不同频段上的个性化过程。特别是,我们研究了该过程仅超过低频组件的情况。通过消融研究,我们表明,即使在不重新培训网络的情况下,低频自我注意力也可以相对于全频率达到非常接近或更好的性能。因此,我们将新颖的插件模块设计和嵌入到CNN网络的头部,我们称为FSANET。频率自我注意事项1)仅需要几个低频系数作为输入,2)在数学上可以等于使用线性结构的空间域自我注意,3)简化令牌映射($ 1 \ times1 $卷积)阶段和同时混合阶段。我们表明,频率自我注意力需要$ 87.29 \%\ sim 90.04 \%$ $减少内存,$ 96.13 \%\ sim 98.07 \%$更少的flops和$ 97.56 \%\%\%\%\ sim 98.18 \%\%\%\%\%\%\%\%$在运行时间内,比常规的自我意见。与其他基于RESNET101的自我注意力网络相比,\ Ourm在CityScape测试数据集上取得了新的\ SART结果($ 83.0 \%$ MIOU),并在ADE20K和VOCAUG上取得了竞争性结果。 \ ourem还可以增强蒙版R-CNN,例如对可可的分割。此外,利用所提出的模块,可以在一系列具有不同尺度的模型上提升Segformer,即使不进行重新培训也可以改进Segformer-B5。代码可在\ url {https://github.com/zfy-csu/fsanet上访问

Considering the spectral properties of images, we propose a new self-attention mechanism with highly reduced computational complexity, up to a linear rate. To better preserve edges while promoting similarity within objects, we propose individualized processes over different frequency bands. In particular, we study a case where the process is merely over low-frequency components. By ablation study, we show that low frequency self-attention can achieve very close or better performance relative to full frequency even without retraining the network. Accordingly, we design and embed novel plug-and-play modules to the head of a CNN network that we refer to as FsaNet. The frequency self-attention 1) requires only a few low frequency coefficients as input, 2) can be mathematically equivalent to spatial domain self-attention with linear structures, 3) simplifies token mapping ($1\times1$ convolution) stage and token mixing stage simultaneously. We show that frequency self-attention requires $87.29\% \sim 90.04\%$ less memory, $96.13\% \sim 98.07\%$ less FLOPs, and $97.56\% \sim 98.18\%$ in run time than the regular self-attention. Compared to other ResNet101-based self-attention networks, \ourM achieves a new \sArt result ($83.0\%$ mIoU) on Cityscape test dataset and competitive results on ADE20k and VOCaug. \ourM can also enhance MASK R-CNN for instance segmentation on COCO. In addition, utilizing the proposed module, Segformer can be boosted on a series of models with different scales, and Segformer-B5 can be improved even without retraining. Code is accessible at \url{https://github.com/zfy-csu/FsaNet

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源