论文标题
积极的人姿势估计的深度加强学习
Deep Reinforcement Learning for Active Human Pose Estimation
论文作者
论文摘要
大多数3D人类姿势估计方法都假定,给出了输入(无论是从一个或几个观点还是从视频中收集的场景的图像)。因此,他们专注于通过空间和/或时间融合信息来利用先验知识和测量的估计。在本文中,我们通过选择提高其估计准确性的信息观点来解决一个主动观察者的问题,即在“时间冻结”模式下自由移动和探索场景。为此,我们介绍了姿势-DRL,这是一种完全可训练的深度强化学习的主动姿势估计体系结构,该体系结构学会在时空中选择适当的视图,以喂养潜在的单眼姿势估计器。我们使用在这两种设置中具有强大结果的单目标估计器和多目标估计器评估了我们的模型。我们的系统进一步学习了时间的自动停止条件,并过渡功能到视频中的下一个时间处理步骤。在全面的多视图设置以及包含多个人的复杂场景的广泛实验中,我们表明我们的模型学会了选择与强大的多视图基线相比,这些观点可产生更准确的姿势估计值。
Most 3d human pose estimation methods assume that input -- be it images of a scene collected from one or several viewpoints, or from a video -- is given. Consequently, they focus on estimates leveraging prior knowledge and measurement by fusing information spatially and/or temporally, whenever available. In this paper we address the problem of an active observer with freedom to move and explore the scene spatially -- in `time-freeze' mode -- and/or temporally, by selecting informative viewpoints that improve its estimation accuracy. Towards this end, we introduce Pose-DRL, a fully trainable deep reinforcement learning-based active pose estimation architecture which learns to select appropriate views, in space and time, to feed an underlying monocular pose estimator. We evaluate our model using single- and multi-target estimators with strong result in both settings. Our system further learns automatic stopping conditions in time and transition functions to the next temporal processing step in videos. In extensive experiments with the Panoptic multi-view setup, and for complex scenes containing multiple people, we show that our model learns to select viewpoints that yield significantly more accurate pose estimates compared to strong multi-view baselines.