多任务为人类运动预测的非解放模型

论文标题

多任务为人类运动预测的非解放模型

Multitask Non-Autoregressive Model for Human Motion Prediction

论文作者

Li, Bin, Tian, Jian, Zhang, Zhongfei, Feng, Hailin, Li, Xi

论文摘要

人类运动预测旨在预测过去的人类骨骼，这是一个典型的序列与序列问题。因此，在探索不同的基于RNN的编码器架构方面，继续进行了广泛的努力。但是，通过生成以先前生成的目标为条件的目标姿势，这些模型很容易带来诸如错误积累问题之类的问题。在本文中，我们认为这种问题主要是由于采用自回归方式引起的。因此，提出了一种新型的非自动入学模型（NAT），该模型是通过完整的非自动回形解码方案以及上下文编码器和位置编码模块提出的。更具体地说，上下文编码器从时间和空间的角度嵌入了给定的姿势。框架解码器负责独立预测每个未来的姿势。位置编码模块将位置信号注入模型以表示时间顺序。此外，为低级人类骨骼预测和高级人类行动识别提供了多任务训练范式，从而使预测任务有令人信服的改进。我们的方法对人类36M和CMU-MOCAP基准进行了评估，并胜过最先进的自回旋方法。

Human motion prediction, which aims at predicting future human skeletons given the past ones, is a typical sequence-to-sequence problem. Therefore, extensive efforts have been continued on exploring different RNN-based encoder-decoder architectures. However, by generating target poses conditioned on the previously generated ones, these models are prone to bringing issues such as error accumulation problem. In this paper, we argue that such issue is mainly caused by adopting autoregressive manner. Hence, a novel Non-auToregressive Model (NAT) is proposed with a complete non-autoregressive decoding scheme, as well as a context encoder and a positional encoding module. More specifically, the context encoder embeds the given poses from temporal and spatial perspectives. The frame decoder is responsible for predicting each future pose independently. The positional encoding module injects positional signal into the model to indicate temporal order. Moreover, a multitask training paradigm is presented for both low-level human skeleton prediction and high-level human action recognition, resulting in the convincing improvement for the prediction task. Our approach is evaluated on Human3.6M and CMU-Mocap benchmarks and outperforms state-of-the-art autoregressive methods.

下载PDF全文

下载文献需遵守相关版权规定

论文标题