计算机视觉中的自适应注意范围

论文标题

计算机视觉中的自适应注意范围

Adaptive Attention Span in Computer Vision

论文作者

Parker, Jerrod, Kumar, Shakti, Roussy, Joe

论文摘要

用于语言建模的变形金刚的最新发展开辟了计算机视觉研究的新领域。 2019年底的结果表明，当卷积被当地的自我发场内核取代时，对象检测和识别的性能都大大提高。与仅使用卷积的等效体系结构相比，使用局部自发核的模型也显示出较少的参数和拖曳。在这项工作中，我们提出了一种学习局部自我发场内核大小的新方法。然后，我们将其性能与固定尺寸的本地注意力和卷积内核进行比较。我们所有实验和模型的代码均可在https://github.com/joerity/adaptive-criptation-inctention-in-cv上获得

Recent developments in Transformers for language modeling have opened new areas of research in computer vision. Results from late 2019 showed vast performance increases in both object detection and recognition when convolutions are replaced by local self-attention kernels. Models using local self-attention kernels were also shown to have less parameters and FLOPS compared to equivalent architectures that only use convolutions. In this work we propose a novel method for learning the local self-attention kernel size. We then compare its performance to fixed-size local attention and convolution kernels. The code for all our experiments and models is available at https://github.com/JoeRoussy/adaptive-attention-in-cv

下载PDF全文

下载文献需遵守相关版权规定

论文标题