论文标题
UDE:人类运动的统一驾驶引擎
UDE: A Unified Driving Engine for Human Motion Generation
论文作者
论文摘要
生成可控且可编辑的人体运动序列是3D化身生成的关键挑战。很长一段时间以来,在开发和应用基于学习的方法之前,生成和动画人类运动一直是劳动密集型的。但是,这些方法仍然是特定于任务或模态特异性\ cite {ahuja2019language2pose} \ cite {ghosh2021synthesis} \ cite {ferreira2021lelect} \ cite {li2021ai}。 In this paper, we propose ``UDE", the first unified driving engine that enables generating human motion sequences from natural language or audio sequences (see Fig.~\ref{fig:teaser}). Specifically, UDE consists of the following key components: 1) a motion quantization module based on VQVAE that represents continuous motion sequence as discrete latent code\cite{van2017neural}, 2)模态敏锐的变压器编码\ cite {vaswani2017注意力}学会映射模态驱动信号到联合空间,3)一个统一的令牌变压器(gptlike \ cite \ cite \ cite {radford2019language})网络以预测量化的潜在intex index in note noutive in not a自动反应。输入运动代币并将其解码为高多样性的运动序列。 \ url {https://github.com/zixiangzhou916/ude/
Generating controllable and editable human motion sequences is a key challenge in 3D Avatar generation. It has been labor-intensive to generate and animate human motion for a long time until learning-based approaches have been developed and applied recently. However, these approaches are still task-specific or modality-specific\cite {ahuja2019language2pose}\cite{ghosh2021synthesis}\cite{ferreira2021learning}\cite{li2021ai}. In this paper, we propose ``UDE", the first unified driving engine that enables generating human motion sequences from natural language or audio sequences (see Fig.~\ref{fig:teaser}). Specifically, UDE consists of the following key components: 1) a motion quantization module based on VQVAE that represents continuous motion sequence as discrete latent code\cite{van2017neural}, 2) a modality-agnostic transformer encoder\cite{vaswani2017attention} that learns to map modality-aware driving signals to a joint space, and 3) a unified token transformer (GPT-like\cite{radford2019language}) network to predict the quantized latent code index in an auto-regressive manner. 4) a diffusion motion decoder that takes as input the motion tokens and decodes them into motion sequences with high diversity. We evaluate our method on HumanML3D\cite{Guo_2022_CVPR} and AIST++\cite{li2021learn} benchmarks, and the experiment results demonstrate our method achieves state-of-the-art performance. Project website: \url{https://github.com/zixiangzhou916/UDE/