体积变压器网络

论文标题

体积变压器网络

Volumetric Transformer Networks

论文作者

Kim, Seungryong, Süsstrunk, Sabine, Salzmann, Mathieu

论文摘要

在深卷积神经网络（CNN）中编码空间不变性的现有技术将相同的翘曲场应用于所有特征渠道。这并不是说明单个特征频道可以代表不同语义部分的事实，该频道可以经历不同的空间转换W.R.T.规范配置。为了克服这一限制，我们引入了一个可学习的模块，即体积变压器网络（VTN），该模块可以预测频道的翘曲场，从而重新配置中间CNN的CNN特征在空间和通道上。我们将VTN设计为编码器码头网络，其模块致力于让信息流跨特征渠道流动，以说明语义部分之间的依赖关系。我们进一步提出了在成对实例的扭曲特征之间定义的损失函数，从而提高了VTN的本地化能力。我们的实验表明，VTN始终提高特征的表示功率，因此在细粒度的图像识别和实例级图像检索上的精度。

Existing techniques to encode spatial invariance within deep convolutional neural networks (CNNs) apply the same warping field to all the feature channels. This does not account for the fact that the individual feature channels can represent different semantic parts, which can undergo different spatial transformations w.r.t. a canonical configuration. To overcome this limitation, we introduce a learnable module, the volumetric transformer network (VTN), that predicts channel-wise warping fields so as to reconfigure intermediate CNN features spatially and channel-wisely. We design our VTN as an encoder-decoder network, with modules dedicated to letting the information flow across the feature channels, to account for the dependencies between the semantic parts. We further propose a loss function defined between the warped features of pairs of instances, which improves the localization ability of VTN. Our experiments show that VTN consistently boosts the features' representation power and consequently the networks' accuracy on fine-grained image recognition and instance-level image retrieval.

下载PDF全文

下载文献需遵守相关版权规定

论文标题