动态神经肖像

论文标题

动态神经肖像

Dynamic Neural Portraits

论文作者

Doukas, Michail Christos, Ploumpis, Stylianos, Zafeiriou, Stefanos

论文摘要

我们提出了动态的神经肖像，这是一种全面重演问题的新方法。我们的方法通过明确控制头部姿势，面部表情和眼睛注视来生成照片现实的视频肖像。我们提出的体系结构不同于现有的方法，这些方法依赖于基于GAN的图像到图像翻译网络，以将3D面的效果转换为照片现实图像。相反，我们在具有可控动力学的基于2D坐标的MLP上构建了系统。与最近的3D NERF式系统相比，我们采用2D代表的直觉源于视频肖像是由单眼固定摄像机捕获的，因此，只有一个单一的观点可用。尽管如此，我们首先将生成模型定为表达杂物形状，尽管如此，我们表明我们的系统也可以由音频功能成功驱动。我们的实验表明，所提出的方法比最近的基于NERF的重新制定方法快270倍，而我们的网络实现了24 fps的速度，用于分辨率高达1024 x 1024，同时在视觉质量方面胜过先前的作品。

We present Dynamic Neural Portraits, a novel approach to the problem of full-head reenactment. Our method generates photo-realistic video portraits by explicitly controlling head pose, facial expressions and eye gaze. Our proposed architecture is different from existing methods that rely on GAN-based image-to-image translation networks for transforming renderings of 3D faces into photo-realistic images. Instead, we build our system upon a 2D coordinate-based MLP with controllable dynamics. Our intuition to adopt a 2D-based representation, as opposed to recent 3D NeRF-like systems, stems from the fact that video portraits are captured by monocular stationary cameras, therefore, only a single viewpoint of the scene is available. Primarily, we condition our generative model on expression blendshapes, nonetheless, we show that our system can be successfully driven by audio features as well. Our experiments demonstrate that the proposed method is 270 times faster than recent NeRF-based reenactment methods, with our networks achieving speeds of 24 fps for resolutions up to 1024 x 1024, while outperforming prior works in terms of visual quality.

下载PDF全文

下载文献需遵守相关版权规定

论文标题