论文标题
Makeittalk:说话者意识到说话头动画
MakeItTalk: Speaker-Aware Talking-Head Animation
论文作者
论文摘要
我们提出了一种方法,该方法从单个面部图像中生成具有音频作为唯一输入的表达式说话头。与以前尝试学习从音频到原始像素的直接映射的方法相反,我们的方法首先将输入音频信号中的内容和扬声器信息删除。音频内容可靠地控制嘴唇和附近面部区域的运动,而扬声器信息则确定面部表情的细节和其他会说话的头动力学。我们方法的另一个关键组成部分是对反映说话者感知动态的面部地标的预测。基于此中间表示,我们的方法能够通过全面运动综合整个会说话的人的影片视频,并在单个统一的框架中为艺术绘画,素描,2D卡通人物,日本芒果,日本漫画,风格化的漫画。除了用户研究外,我们还对我们的方法进行了广泛的定量和定性评估,证明与先前的最新面前相比,产生的会说话头明显更高。
We present a method that generates expressive talking heads from a single facial image with audio as the only input. In contrast to previous approaches that attempt to learn direct mappings from audio to raw pixels or points for creating talking faces, our method first disentangles the content and speaker information in the input audio signal. The audio content robustly controls the motion of lips and nearby facial regions, while the speaker information determines the specifics of facial expressions and the rest of the talking head dynamics. Another key component of our method is the prediction of facial landmarks reflecting speaker-aware dynamics. Based on this intermediate representation, our method is able to synthesize photorealistic videos of entire talking heads with full range of motion and also animate artistic paintings, sketches, 2D cartoon characters, Japanese mangas, stylized caricatures in a single unified framework. We present extensive quantitative and qualitative evaluation of our method, in addition to user studies, demonstrating generated talking heads of significantly higher quality compared to prior state-of-the-art.