论文标题
人群:弱监督人群的人群数量提高了可推广性
CrowdFormer: Weakly-supervised Crowd counting with Improved Generalizability
论文作者
论文摘要
卷积神经网络(CNN)由于学习本地特征的强大能力,已经在计算机视野中占据了近十年的主导地位。但是,由于他们的接受领域有限,CNN无法对全球环境进行建模。另一方面,基于注意力的体系结构的变压器可以轻松地对全局上下文进行建模。尽管如此,仍有有限的研究研究了变形金刚在人群计数中的有效性。此外,大多数现有的人群计数方法基于密度图的回归,该密度图需要对现场中每个人进行点级注释。这项注释任务很费力,也容易出错。这导致人们对仅需要计数级注释的弱监督人群计数方法的关注越来越大。在本文中,我们提出了一种使用金字塔视觉变压器进行人群计数的弱监督方法。我们已经进行了广泛的评估,以验证该方法的有效性。我们的方法与基准人群数据集上的最新方法相媲美。更重要的是,它显示出了显着的普遍性。
Convolutional neural networks (CNNs) have dominated the field of computer vision for nearly a decade due to their strong ability to learn local features. However, due to their limited receptive field, CNNs fail to model the global context. On the other hand, transformer, an attention-based architecture can model the global context easily. Despite this, there are limited studies that investigate the effectiveness of transformers in crowd counting. In addition, the majority of the existing crowd counting methods are based on the regression of density maps which requires point-level annotation of each person present in the scene. This annotation task is laborious and also error-prone. This has led to increased focus on weakly-supervised crowd counting methods which require only the count-level annotations. In this paper, we propose a weakly-supervised method for crowd counting using a pyramid vision transformer. We have conducted extensive evaluations to validate the effectiveness of the proposed method. Our method is comparable to the state-of-the-art on the benchmark crowd datasets. More importantly, it shows remarkable generalizability.