论文标题
事件中的人:复杂事件中以人为中心视频分析的大规模基准
Human in Events: A Large-Scale Benchmark for Human-centric Video Analysis in Complex Events
论文作者
论文摘要
随着现代智能城市的发展,以人为中心的视频分析遇到了分析真实场景中多元化和复杂事件的挑战。一个复杂的事件涉及密集的人群,异常的个体或集体行为。但是,受现有视频数据集的规模和覆盖范围的限制,很少有人类分析方法报告了他们在此类复杂事件上的表现。为此,我们提出了一个新的大规模数据集,其中包含全面的注释,称为人类事件或hieve(复杂事件中的以人为中心的视频分析),以理解人类动作,姿势和在各种现实事件中的行动,尤其是在人群和复杂事件中。它包含创纪录的姿势数(> 1m),这是复杂事件下最大数量的动作实例(> 56K),以及持续更长时间的最大轨迹之一(平均轨迹长度为480帧)。根据其多样化的注释,我们分别提出了两个简单的基线,以进行行动识别和姿势估计。他们在训练过程中利用跨标签信息来增强相应视觉任务中的功能学习。实验表明,它们可以提高现有动作识别和姿势估计管道的性能。更重要的是,他们证明了Hieve中广泛的注释可以改善各种视频任务。此外,我们进行了广泛的实验,以基于最新视频分析方法以及基线方法,证明Hieve是一个充满以人为中心的视频分析的挑战性数据集。我们预计数据集将推进以人为中心分析和对复杂事件的理解中尖端技术的发展。该数据集可在http://humaninevents.org上找到
Along with the development of modern smart cities, human-centric video analysis has been encountering the challenge of analyzing diverse and complex events in real scenes. A complex event relates to dense crowds, anomalous individuals, or collective behaviors. However, limited by the scale and coverage of existing video datasets, few human analysis approaches have reported their performances on such complex events. To this end, we present a new large-scale dataset with comprehensive annotations, named Human-in-Events or HiEve (Human-centric video analysis in complex Events), for the understanding of human motions, poses, and actions in a variety of realistic events, especially in crowd & complex events. It contains a record number of poses (>1M), the largest number of action instances (>56k) under complex events, as well as one of the largest numbers of trajectories lasting for longer time (with an average trajectory length of >480 frames). Based on its diverse annotation, we present two simple baselines for action recognition and pose estimation, respectively. They leverage cross-label information during training to enhance the feature learning in corresponding visual tasks. Experiments show that they could boost the performance of existing action recognition and pose estimation pipelines. More importantly, they prove the widely ranged annotations in HiEve can improve various video tasks. Furthermore, we conduct extensive experiments to benchmark recent video analysis approaches together with our baseline methods, demonstrating HiEve is a challenging dataset for human-centric video analysis. We expect that the dataset will advance the development of cutting-edge techniques in human-centric analysis and the understanding of complex events. The dataset is available at http://humaninevents.org