一个用于综合影片虚拟新闻锚的神经唇部同步框架

论文标题

一个用于综合影片虚拟新闻锚的神经唇部同步框架

A Neural Lip-Sync Framework for Synthesizing Photorealistic Virtual News Anchors

论文作者

Zheng, Ruobing, Zhu, Zhou, Song, Bo, Ji, Changjiang

论文摘要

LIP Sync已成为一种有前途的技术，用于从音频信号产生口腔运动。但是，综合高分辨率和影像现实主义的虚拟新闻主持人仍然具有挑战性。缺乏自然外观，视觉一致性和处理效率是现有方法的主要问题。本文介绍了一个新颖的Lip-同步框架，专门设计用于生产高保真虚拟新闻锚。一对时间卷积网络用于学习从音频信号到口腔运动的跨模式顺序映射，然后是一个神经渲染网络，将合成面部图转化为高分辨率和相思主义的外观。这个完全可训练的框架提供了端到端的处理，在许多低估应用程序中都优于基于图形的传统方法。实验还表明，在视觉外观和效率方面，该框架比现代神经方法具有优势。

Lip sync has emerged as a promising technique for generating mouth movements from audio signals. However, synthesizing a high-resolution and photorealistic virtual news anchor is still challenging. Lack of natural appearance, visual consistency, and processing efficiency are the main problems with existing methods. This paper presents a novel lip-sync framework specially designed for producing high-fidelity virtual news anchors. A pair of Temporal Convolutional Networks are used to learn the cross-modal sequential mapping from audio signals to mouth movements, followed by a neural rendering network that translates the synthetic facial map into a high-resolution and photorealistic appearance. This fully trainable framework provides end-to-end processing that outperforms traditional graphics-based methods in many low-delay applications. Experiments also show the framework has advantages over modern neural-based methods in both visual appearance and efficiency.

下载PDF全文

下载文献需遵守相关版权规定

论文标题