论文标题
空中现场分类的不变深度压缩协方差汇总
Invariant Deep Compressible Covariance Pooling for Aerial Scene Categorization
论文作者
论文摘要
学习判别性和不变特征表示是视觉图像分类的关键。在本文中,我们提出了一种新颖的不变深度压缩协方差池(IDCCP),以解决空中场景分类中的滋扰变化。我们考虑根据有限转换组转换输入图像,该组由多个混杂的正交矩阵(例如D4组)组成。然后,我们采用暹罗风格的网络将组结构转移到表示空间,在该空间中,我们可以在小组动作下得出一个琐碎的表示。接受琐碎表示的线性分类器也将具有不变性。为了进一步提高表示形式的判别能力,我们将表示形式扩展到张量空间,同时在转换矩阵上施加正交约束,以有效地降低特征维度。我们对公开发布的航空场景图像数据集进行了广泛的实验,并证明了与最新方法相比,该方法的优越性。特别是,使用Resnet架构,我们的IDCCP模型可以将张量表示的尺寸降低约98%,而不会牺牲准确性(即<0.5%)。
Learning discriminative and invariant feature representation is the key to visual image categorization. In this article, we propose a novel invariant deep compressible covariance pooling (IDCCP) to solve nuisance variations in aerial scene categorization. We consider transforming the input image according to a finite transformation group that consists of multiple confounding orthogonal matrices, such as the D4 group. Then, we adopt a Siamese-style network to transfer the group structure to the representation space, where we can derive a trivial representation that is invariant under the group action. The linear classifier trained with trivial representation will also be possessed with invariance. To further improve the discriminative power of representation, we extend the representation to the tensor space while imposing orthogonal constraints on the transformation matrix to effectively reduce feature dimensions. We conduct extensive experiments on the publicly released aerial scene image data sets and demonstrate the superiority of this method compared with state-of-the-art methods. In particular, with using ResNet architecture, our IDCCP model can reduce the dimension of the tensor representation by about 98% without sacrificing accuracy (i.e., <0.5%).