匹配功能集，用于几个图像分类

论文标题

匹配功能集，用于几个图像分类

Matching Feature Sets for Few-Shot Image Classification

论文作者

Afrasiyabi, Arman, Larochelle, Hugo, Lalonde, Jean-François, Gagné, Christian

论文摘要

在图像分类中，常见的做法是训练深网以每个输入图像提取单个特征向量。很少有射击分类方法也主要遵循这一趋势。在这项工作中，我们偏离了这个建立的方向，而是建议为每个图像提取特征向量集。我们认为，基于集合的表示本质上会构建基本类的图像的更丰富的表示，随后可以更好地转移到几类。为此，我们建议调整现有的功能提取器，以改为从图像中产生一组特征向量。我们的方法被称为setfeat，嵌入了现有编码器体系结构内的浅自我注意事项机制。注意模块是轻巧的，因此我们的方法导致编码器具有与原始版本大致相同的参数数量。在训练和推断期间，设定匹配度量指标用于执行图像分类。我们提出的架构和指标的有效性是通过在1和5摄的场景中对标准的几种数据集（即Miniimagenet，Tieredimagenet和Cub）进行的彻底实验来证明的。在所有情况下，除一种情况下，我们的方法的表现都优于最先进的方法。

In image classification, it is common practice to train deep networks to extract a single feature vector per input image. Few-shot classification methods also mostly follow this trend. In this work, we depart from this established direction and instead propose to extract sets of feature vectors for each image. We argue that a set-based representation intrinsically builds a richer representation of images from the base classes, which can subsequently better transfer to the few-shot classes. To do so, we propose to adapt existing feature extractors to instead produce sets of feature vectors from images. Our approach, dubbed SetFeat, embeds shallow self-attention mechanisms inside existing encoder architectures. The attention modules are lightweight, and as such our method results in encoders that have approximately the same number of parameters as their original versions. During training and inference, a set-to-set matching metric is used to perform image classification. The effectiveness of our proposed architecture and metrics is demonstrated via thorough experiments on standard few-shot datasets -- namely miniImageNet, tieredImageNet, and CUB -- in both the 1- and 5-shot scenarios. In all cases but one, our method outperforms the state-of-the-art.

下载PDF全文

下载文献需遵守相关版权规定

论文标题