完全交叉转换器的几个射击对象检测

论文标题

完全交叉转换器的几个射击对象检测

Few-Shot Object Detection with Fully Cross-Transformer

论文作者

Han, Guangxing, Ma, Jiawei, Huang, Shiyuan, Chen, Long, Chang, Shih-Fu

论文摘要

很少有射击对象检测（FSOD），目的是使用很少的培训示例来检测新颖的对象，最近对社区引起了极大的研究兴趣。基于度量学习的方法已证明使用基于两分支的暹罗网络对此任务有效，并计算图像区域之间的相似性和几乎没有弹射的示例以进行检测。但是，在以前的工作中，两个分支之间的相互作用仅在检测头中受到限制，而将其余数百个层留出用于单独的特征提取。受到有关视觉变压器和视觉变压器的最新工作的启发，我们通过将跨转化器纳入特征主链和检测头中，提出了一种新型的FSOD基于跨变速器的模型（FCT）。提出了不对称批次的交叉注意，以从不同批量大小的两个分支中汇总关键信息。我们的模型可以通过引入多级交互来改善两个分支之间的几个相似性学习。对Pascal VOC和MSCOCO FSOD基准测试的全面实验证明了我们模型的有效性。

Few-shot object detection (FSOD), with the aim to detect novel objects using very few training examples, has recently attracted great research interest in the community. Metric-learning based methods have been demonstrated to be effective for this task using a two-branch based siamese network, and calculate the similarity between image regions and few-shot examples for detection. However, in previous works, the interaction between the two branches is only restricted in the detection head, while leaving the remaining hundreds of layers for separate feature extraction. Inspired by the recent work on vision transformers and vision-language transformers, we propose a novel Fully Cross-Transformer based model (FCT) for FSOD by incorporating cross-transformer into both the feature backbone and detection head. The asymmetric-batched cross-attention is proposed to aggregate the key information from the two branches with different batch sizes. Our model can improve the few-shot similarity learning between the two branches by introducing the multi-level interactions. Comprehensive experiments on both PASCAL VOC and MSCOCO FSOD benchmarks demonstrate the effectiveness of our model.

下载PDF全文

下载文献需遵守相关版权规定

论文标题