重新考虑卷积网络的空间不变性以进行对象计数

论文标题

重新考虑卷积网络的空间不变性以进行对象计数

Rethinking Spatial Invariance of Convolutional Networks for Object Counting

论文作者

Cheng, Zhi-Qi, Dai, Qi, Li, Hong, Song, JingKuan, Wu, Xiao, Hauptmann, Alexander G.

论文摘要

以前的工作通常认为，改善卷积网络的空间不变性是对象计数的关键。但是，在验证了几个主流计数网络之后，我们出人意料地发现，太严格的像素级空间不变性将导致密度图生成中的噪声过高。在本文中，我们尝试使用本地连接的高斯内核来替换原始的卷积过滤器，以估计密度图中的空间位置。这样做的目的是允许特征提取过程可能刺激密度生成过程以克服注释噪声。受到先前工作的启发，我们提出了一个低级别的近似值，并伴随着翻译不变性，以有利地实施大量高斯卷积的近似值。我们的工作指向了后续研究的新方向，该方向应该研究如何正确放松过度严格的像素级空间不变性以进行对象计数。我们在4个主流对象计数网络（即MCNN，CSRNET，SANET和RESNET-50）上评估我们的方法。在7个流行的基准测试基准上进行了大量实验（即人群，车辆和植物计数）。实验结果表明，我们的方法显着超过其他最先进的方法，并实现了对物体空间位置的有希望的学习。

Previous work generally believes that improving the spatial invariance of convolutional networks is the key to object counting. However, after verifying several mainstream counting networks, we surprisingly found too strict pixel-level spatial invariance would cause overfit noise in the density map generation. In this paper, we try to use locally connected Gaussian kernels to replace the original convolution filter to estimate the spatial position in the density map. The purpose of this is to allow the feature extraction process to potentially stimulate the density map generation process to overcome the annotation noise. Inspired by previous work, we propose a low-rank approximation accompanied with translation invariance to favorably implement the approximation of massive Gaussian convolution. Our work points a new direction for follow-up research, which should investigate how to properly relax the overly strict pixel-level spatial invariance for object counting. We evaluate our methods on 4 mainstream object counting networks (i.e., MCNN, CSRNet, SANet, and ResNet-50). Extensive experiments were conducted on 7 popular benchmarks for 3 applications (i.e., crowd, vehicle, and plant counting). Experimental results show that our methods significantly outperform other state-of-the-art methods and achieve promising learning of the spatial position of objects.

下载PDF全文

下载文献需遵守相关版权规定

论文标题