音频驱动的神经手势重演带有视频运动图

论文标题

音频驱动的神经手势重演带有视频运动图

Audio-driven Neural Gesture Reenactment with Video Motion Graphs

论文作者

Zhou, Yang, Yang, Jimei, Li, Dingzeyu, Saito, Jun, Aneja, Deepali, Kalogerakis, Evangelos

论文摘要

人的言语通常伴随着包括手臂和手势在内的身体手势。我们提出了一种方法，该方法将与目标语音音频相匹配的手势重新效果。我们方法的关键思想是通过编码剪辑之间的有效过渡的新型视频运动图将剪辑和重新组装从参考视频中。为了在重演中无缝连接不同的剪辑，我们提出了一个姿势感知的视频混合网络，该网络综合了两个剪辑之间的缝线框架周围的视频帧。此外，我们开发了一种基于音频的手势搜索算法，以找到重新成型帧的最佳顺序。我们的系统产生的重演与音频节奏和语音内容一致。我们定性地，定性地评估了综合视频质量，并通过用户研究来表明，与以前的工作和基线相比，我们的方法与目标音频具有更高质量和一致性的视频。

Human speech is often accompanied by body gestures including arm and hand gestures. We present a method that reenacts a high-quality video with gestures matching a target speech audio. The key idea of our method is to split and re-assemble clips from a reference video through a novel video motion graph encoding valid transitions between clips. To seamlessly connect different clips in the reenactment, we propose a pose-aware video blending network which synthesizes video frames around the stitched frames between two clips. Moreover, we developed an audio-based gesture searching algorithm to find the optimal order of the reenacted frames. Our system generates reenactments that are consistent with both the audio rhythms and the speech content. We evaluate our synthesized video quality quantitatively, qualitatively, and with user studies, demonstrating that our method produces videos of much higher quality and consistency with the target audio compared to previous work and baselines.

下载PDF全文

下载文献需遵守相关版权规定

论文标题