论文标题
人类关节运动学扩散 - 随机运动预测
Human Joint Kinematics Diffusion-Refinement for Stochastic Motion Prediction
论文作者
论文摘要
鉴于过去的单个姿势序列,随机人类运动预测旨在预测多个合理的未来动作。大多数以前的作品都集中在设计详尽的损失以提高准确性,而多样性通常以从潜在先验中随机采样一组潜在变量的特征,然后将其解码为可能的运动。然而,这种对采样和解码的联合训练遭受了后部崩溃,因为学习的潜在变量往往被强大的解码器忽略,从而导致多样性有限。或者,受到非平衡热力学的扩散过程的启发,我们提出了MotionDiff,这是一种扩散的概率模型,将人类关节的运动学作为加热颗粒,将从原始状态扩散到噪声分布。该过程提供了一种自然的方法来获得没有任何可训练参数的“白色”潜伏期,并且可以将人类运动预测视为反向扩散过程,将噪声分布转换为以观察到的序列为条件的现实未来动作。具体而言,Motion Diff由两个部分组成:一个基于时空变压器的扩散网络,以生成各种而却合理的运动,以及一个图形卷积网络,以进一步完善输出。两个数据集上的实验结果表明,我们的模型从准确性和多样性方面产生了竞争性能。
Stochastic human motion prediction aims to forecast multiple plausible future motions given a single pose sequence from the past. Most previous works focus on designing elaborate losses to improve the accuracy, while the diversity is typically characterized by randomly sampling a set of latent variables from the latent prior, which is then decoded into possible motions. This joint training of sampling and decoding, however, suffers from posterior collapse as the learned latent variables tend to be ignored by a strong decoder, leading to limited diversity. Alternatively, inspired by the diffusion process in nonequilibrium thermodynamics, we propose MotionDiff, a diffusion probabilistic model to treat the kinematics of human joints as heated particles, which will diffuse from original states to a noise distribution. This process offers a natural way to obtain the "whitened" latents without any trainable parameters, and human motion prediction can be regarded as the reverse diffusion process that converts the noise distribution into realistic future motions conditioned on the observed sequence. Specifically, MotionDiff consists of two parts: a spatial-temporal transformer-based diffusion network to generate diverse yet plausible motions, and a graph convolutional network to further refine the outputs. Experimental results on two datasets demonstrate that our model yields the competitive performance in terms of both accuracy and diversity.