论文标题
具有身份的现实说话的面孔产生
Identity-Preserving Realistic Talking Face Generation
论文作者
论文摘要
语音驱动的面部动画可用于诸如远程诉讼,聊天机器人等各种应用。具有现实的面部动画的必要属性是1)视听同步(2)目标个体的身份保存(3)合理的嘴巴动作(4)天然眼睛闪烁的存在。现有的方法主要解决了视听唇部同步,而最近的作品很少有人解决自然眼睛闪烁的整体视频现实主义的综合。在本文中,我们提出了一种通过语音来证明具有现实的面部动画的方法。我们首先使用深层语音功能从音频中产生与人的面部标志性,以使其不变性,到不同的声音,口音等。增加现实主义,以添加现实主义,使用无监督的学习和重新标记人的面部地标眨眼,并将独立的地标的依赖于人的地标的特定人物标记来保存面孔的嘴巴构造,从而有助于人的身份,从而有助于面对人的身份。最后,我们使用LSGAN从特定于人面部标志的面部质地生成面部纹理,并使用有助于保留与身份相关的纹理的注意机制。我们提出的方法与当前最新方法的广泛比较表明,在唇部同步精度,图像重建质量,清晰度和身份证实方面有了显着改善。一项用户研究还揭示了我们动画结果对最新方法的改善。据我们所知,这是言语驱动的2D面部动画中的第一部作品,该作品同时介绍了逼真的语音驱动面部动画的所有上述属性。
Speech-driven facial animation is useful for a variety of applications such as telepresence, chatbots, etc. The necessary attributes of having a realistic face animation are 1) audio-visual synchronization (2) identity preservation of the target individual (3) plausible mouth movements (4) presence of natural eye blinks. The existing methods mostly address the audio-visual lip synchronization, and few recent works have addressed the synthesis of natural eye blinks for overall video realism. In this paper, we propose a method for identity-preserving realistic facial animation from speech. We first generate person-independent facial landmarks from audio using DeepSpeech features for invariance to different voices, accents, etc. To add realism, we impose eye blinks on facial landmarks using unsupervised learning and retargets the person-independent landmarks to person-specific landmarks to preserve the identity-related facial structure which helps in the generation of plausible mouth shapes of the target identity. Finally, we use LSGAN to generate the facial texture from person-specific facial landmarks, using an attention mechanism that helps to preserve identity-related texture. An extensive comparison of our proposed method with the current state-of-the-art methods demonstrates a significant improvement in terms of lip synchronization accuracy, image reconstruction quality, sharpness, and identity-preservation. A user study also reveals improved realism of our animation results over the state-of-the-art methods. To the best of our knowledge, this is the first work in speech-driven 2D facial animation that simultaneously addresses all the above-mentioned attributes of a realistic speech-driven face animation.