细粒度的人群数

论文标题

细粒度的人群数

Fine-Grained Crowd Counting

论文作者

Wan, Jia, Kumar, Nikil Senthil, Chan, Antoni B.

论文摘要

当前的人群计算算法仅关注图像中缺少低级细粒度信息的图像中的人数。对于许多实际应用，图像中的人数不如每个子类别中的人数那么有用。例如，知道等待内联或浏览的人数可以帮助零售商店；知道站立/坐着的人数可以帮助餐馆/自助餐厅；知道暴力/非暴力人员的数量可以帮助警察进行人群管理。在本文中，我们提出了细分人群的计数，该人群根据个人的低级行为属性（例如站立/坐姿或暴力行为）将人群分为类别，然后计算每个类别中的人数。为了在这一领域进行研究，我们构建了一个新的数据集，其中包括四个现实世界中的细粒度计数任务：在人行道上或坐着，站立或坐着，排队等待，是否表现出暴力行为。由于不同人群类别的外观特征是相似的，因此细粒度人群计数的挑战是有效利用上下文信息来区分类别。我们提出了一个两个分支结构，由密度图估计分支和语义分割分支组成。我们提出了两种改进策略，以改善两个分支的预测。首先，为了编码上下文信息，我们提出了以密度图预测为指导的特征传播，从而消除了传播过程中背景特征的影响。其次，我们提出了一个互补的注意模型，以在两个分支之间共享信息。实验结果证实了我们方法的有效性。

Current crowd counting algorithms are only concerned about the number of people in an image, which lacks low-level fine-grained information of the crowd. For many practical applications, the total number of people in an image is not as useful as the number of people in each sub-category. E.g., knowing the number of people waiting inline or browsing can help retail stores; knowing the number of people standing/sitting can help restaurants/cafeterias; knowing the number of violent/non-violent people can help police in crowd management. In this paper, we propose fine-grained crowd counting, which differentiates a crowd into categories based on the low-level behavior attributes of the individuals (e.g. standing/sitting or violent behavior) and then counts the number of people in each category. To enable research in this area, we construct a new dataset of four real-world fine-grained counting tasks: traveling direction on a sidewalk, standing or sitting, waiting in line or not, and exhibiting violent behavior or not. Since the appearance features of different crowd categories are similar, the challenge of fine-grained crowd counting is to effectively utilize contextual information to distinguish between categories. We propose a two branch architecture, consisting of a density map estimation branch and a semantic segmentation branch. We propose two refinement strategies for improving the predictions of the two branches. First, to encode contextual information, we propose feature propagation guided by the density map prediction, which eliminates the effect of background features during propagation. Second, we propose a complementary attention model to share information between the two branches. Experiment results confirm the effectiveness of our method.

下载PDF全文

下载文献需遵守相关版权规定

论文标题