360 $^{\ circ} $视频中弱监督的多人行动识别

论文标题

360 $^{\ circ} $视频中弱监督的多人行动识别

Weakly-Supervised Multi-Person Action Recognition in 360$^{\circ}$ Videos

论文作者

Li, Junnan, Liu, Jianquan, Wong, Yongkang, Nishimura, Shoji, Kankanhalli, Mohan

论文摘要

商品360 $^{\ circ} $摄像机的最新发展使一个视频捕获了整个场景，该场景在监视方案中赋予了有希望的潜力。但是，全向视频分析中的研究落后于硬件进展。在这项工作中，我们解决了Top-View 360 $^{\ circ} $视频中重要的动作识别问题。由于视图的广泛归档，360 $^{\ circ} $视频通常同时捕获多个执行动作的人。此外，人们的外观变形。所提出的框架首先将全向视频转换为全景视频，然后使用基于区域的3D CNN提取时空特征以供行动识别。我们基于多标签学习，提出了一种弱监督的方法，该方法训练模型，以仅使用视频级别的动作标签作为监督识别和本地化多个动作。我们执行实验以定量验证所提出的方法的功效，并定性地证明了作用定位结果。为了朝这个方向进行研究，我们引入了360Action，这是第一个用于多人动作识别的全向视频数据集。

The recent development of commodity 360$^{\circ}$ cameras have enabled a single video to capture an entire scene, which endows promising potentials in surveillance scenarios. However, research in omnidirectional video analysis has lagged behind the hardware advances. In this work, we address the important problem of action recognition in top-view 360$^{\circ}$ videos. Due to the wide filed-of-view, 360$^{\circ}$ videos usually capture multiple people performing actions at the same time. Furthermore, the appearance of people are deformed. The proposed framework first transforms omnidirectional videos into panoramic videos, then it extracts spatial-temporal features using region-based 3D CNNs for action recognition. We propose a weakly-supervised method based on multi-instance multi-label learning, which trains the model to recognize and localize multiple actions in a video using only video-level action labels as supervision. We perform experiments to quantitatively validate the efficacy of the proposed method and qualitatively demonstrate action localization results. To enable research in this direction, we introduce 360Action, the first omnidirectional video dataset for multi-person action recognition.

下载PDF全文

下载文献需遵守相关版权规定

论文标题