论文标题
Zeroeggs:零拍的示例手势产生
ZeroEGGS: Zero-shot Example-based Gesture Generation from Speech
论文作者
论文摘要
我们提出了Zeroeggs,这是一个神经网络框架,用于语音驱动的手势生成,以零拍出样式控制。这意味着即使在训练过程中看不见的运动样式,风格也只能通过一个简短的运动剪辑来控制。我们的模型使用一个变分框架来学习样式嵌入,从而可以通过潜在的空间操纵或样式嵌入方式的混合和缩放来修改样式。我们框架的概率性质进一步促进了相同输入的各种输出的产生,从而解决了手势运动的随机性质。在一系列实验中,我们首先证明了模型对新的扬声器和样式的灵活性和普遍性。然后,在一项用户研究中,我们表明我们的模型在自然性,语音适当性和样式刻画方面的自然性,适当性的表现优于先前的最新技术。最后,我们发布了包括手指在内的全身手势运动的高质量数据集,语音跨越了19种不同的样式。
We present ZeroEGGS, a neural network framework for speech-driven gesture generation with zero-shot style control by example. This means style can be controlled via only a short example motion clip, even for motion styles unseen during training. Our model uses a Variational framework to learn a style embedding, making it easy to modify style through latent space manipulation or blending and scaling of style embeddings. The probabilistic nature of our framework further enables the generation of a variety of outputs given the same input, addressing the stochastic nature of gesture motion. In a series of experiments, we first demonstrate the flexibility and generalizability of our model to new speakers and styles. In a user study, we then show that our model outperforms previous state-of-the-art techniques in naturalness of motion, appropriateness for speech, and style portrayal. Finally, we release a high-quality dataset of full-body gesture motion including fingers, with speech, spanning across 19 different styles.