UDE：人类运动的统一驾驶引擎

论文标题

UDE：人类运动的统一驾驶引擎

UDE: A Unified Driving Engine for Human Motion Generation

论文作者

Zhou, Zixiang, Wang, Baoyuan

论文摘要

生成可控且可编辑的人体运动序列是3D化身生成的关键挑战。很长一段时间以来，在开发和应用基于学习的方法之前，生成和动画人类运动一直是劳动密集型的。但是，这些方法仍然是特定于任务或模态特异性\ cite {ahuja2019language2pose} \ cite {ghosh2021synthesis} \ cite {ferreira2021lelect} \ cite {li2021ai}。 In this paper, we propose ``UDE", the first unified driving engine that enables generating human motion sequences from natural language or audio sequences (see Fig.~\ref{fig:teaser}). Specifically, UDE consists of the following key components: 1) a motion quantization module based on VQVAE that represents continuous motion sequence as discrete latent code\cite{van2017neural}, 2）模态敏锐的变压器编码\ cite {vaswani2017注意力}学会映射模态驱动信号到联合空间，3）一个统一的令牌变压器（gptlike \ cite \ cite \ cite {radford2019language}）网络以预测量化的潜在intex index in note noutive in not a自动反应。输入运动代币并将其解码为高多样性的运动序列。 \ url {https://github.com/zixiangzhou916/ude/

Generating controllable and editable human motion sequences is a key challenge in 3D Avatar generation. It has been labor-intensive to generate and animate human motion for a long time until learning-based approaches have been developed and applied recently. However, these approaches are still task-specific or modality-specific\cite {ahuja2019language2pose}\cite{ghosh2021synthesis}\cite{ferreira2021learning}\cite{li2021ai}. In this paper, we propose ``UDE", the first unified driving engine that enables generating human motion sequences from natural language or audio sequences (see Fig.~\ref{fig:teaser}). Specifically, UDE consists of the following key components: 1) a motion quantization module based on VQVAE that represents continuous motion sequence as discrete latent code\cite{van2017neural}, 2) a modality-agnostic transformer encoder\cite{vaswani2017attention} that learns to map modality-aware driving signals to a joint space, and 3) a unified token transformer (GPT-like\cite{radford2019language}) network to predict the quantized latent code index in an auto-regressive manner. 4) a diffusion motion decoder that takes as input the motion tokens and decodes them into motion sequences with high diversity. We evaluate our method on HumanML3D\cite{Guo_2022_CVPR} and AIST++\cite{li2021learn} benchmarks, and the experiment results demonstrate our method achieves state-of-the-art performance. Project website: \url{https://github.com/zixiangzhou916/UDE/

下载PDF全文

下载文献需遵守相关版权规定

论文标题