论文标题
QKVA网格:图像透视和堆叠的Detr的注意
QKVA grid: Attention in Image Perspective and Stacked DETR
论文作者
论文摘要
我们提出了一个名为“堆栈”(SDETR)的新模型,该模型在规范detr中继承了主要思想。我们在两个方向上改善了DETR:简化培训成本并引入堆叠式体系结构以提高性能。对于前者来说,我们专注于注意力障碍的内部,并提出QKVA网格,这是描述关注过程的新观点。通过这种情况,我们可以进一步了解注意力问题如何在图像问题和多头的效果上发挥作用。这两个想法贡献了单头编码器的设计。到后者,SDETR可以达到更好的性能( +0.6AP, +2.7APS)。特别是在小物体上的性能中,SDETR为优化的更快的R-CNN基线取得了更好的结果,这在DETR中是一个缺点。我们的更改基于DETR的守则。培训代码和预估计的模型可在https://github.com/shengwenyuan/sdetr上找到。
We present a new model named Stacked-DETR(SDETR), which inherits the main ideas in canonical DETR. We improve DETR in two directions: simplifying the cost of training and introducing the stacked architecture to enhance the performance. To the former, we focus on the inside of the Attention block and propose the QKVA grid, a new perspective to describe the process of attention. By this, we can step further on how Attention works for image problems and the effect of multi-head. These two ideas contribute the design of single-head encoder-layer. To the latter, SDETR reaches better performance(+0.6AP, +2.7APs) to DETR. Especially to the performance on small objects, SDETR achieves better results to the optimized Faster R-CNN baseline, which was a shortcoming in DETR. Our changes are based on the code of DETR. Training code and pretrained models are available at https://github.com/shengwenyuan/sdetr.