基于编码器 - 模块的卷积神经网络，具有人群计数的多尺度感知模块

论文标题

基于编码器 - 模块的卷积神经网络，具有人群计数的多尺度感知模块

Encoder-Decoder Based Convolutional Neural Networks with Multi-Scale-Aware Modules for Crowd Counting

论文作者

Thanasutives, Pongpisit, Fukui, Ken-ichi, Numao, Masayuki, Kijsirikul, Boonserm

论文摘要

在本文中，我们提出了两个基于双路径多尺度融合网络（SFANET）和SEGNET的修改后的神经网络，以进行准确有效的人群计数。受Sfanet的启发，第一个名为M-SFANET的模型与非常空间的金字塔池（ASPP）和上下文感知模块（CAN）相连。 M-SFANET的编码器通过ASPP增强，其中包含具有不同采样速率的平行非常卷积层，因此能够提取目标对象的多尺度特征并结合更大的上下文。为了进一步处理整个输入图像的比例变化，我们利用可以适应编码上下文信息尺度的CAN模块。该组合产生了一个有效的模型，用于在茂密和稀疏的人群场景中进行计数。基于SFANET解码器结构，M-SFANET的解码器具有双重路径，用于密度图和注意力图的生成。第二个模型称为M-segnet，该模型是通过用segnet中使用的最大不冷的sfanet中的双线性升压来产生的。此更改提供了更快的模型，同时提供竞争性计数性能。 M-Segnet专为高速监视应用而设计，没有其他多尺度感知模块，以免增加复杂性。两种模型都是基于编码器的架构，并且是端到端的训练。我们对五个人群计数数据集和一个车辆计数数据集进行了广泛的实验，以表明这些修改产生的算法可以改善最新的人群计数方法。代码可在https://github.com/pongpisit-thanasutives/variations-of-sfanet-for-crowd-counting中找到。

In this paper, we propose two modified neural networks based on dual path multi-scale fusion networks (SFANet) and SegNet for accurate and efficient crowd counting. Inspired by SFANet, the first model, which is named M-SFANet, is attached with atrous spatial pyramid pooling (ASPP) and context-aware module (CAN). The encoder of M-SFANet is enhanced with ASPP containing parallel atrous convolutional layers with different sampling rates and hence able to extract multi-scale features of the target object and incorporate larger context. To further deal with scale variation throughout an input image, we leverage the CAN module which adaptively encodes the scales of the contextual information. The combination yields an effective model for counting in both dense and sparse crowd scenes. Based on the SFANet decoder structure, M-SFANet's decoder has dual paths, for density map and attention map generation. The second model is called M-SegNet, which is produced by replacing the bilinear upsampling in SFANet with max unpooling that is used in SegNet. This change provides a faster model while providing competitive counting performance. Designed for high-speed surveillance applications, M-SegNet has no additional multi-scale-aware module in order to not increase the complexity. Both models are encoder-decoder based architectures and are end-to-end trainable. We conduct extensive experiments on five crowd counting datasets and one vehicle counting dataset to show that these modifications yield algorithms that could improve state-of-the-art crowd counting methods. Codes are available at https://github.com/Pongpisit-Thanasutives/Variations-of-SFANet-for-Crowd-Counting.

下载PDF全文

下载文献需遵守相关版权规定

论文标题