论文标题

JNR:紧凑型3D面部建模的基于联合的神经钻机表示

JNR: Joint-based Neural Rig Representation for Compact 3D Face Modeling

论文作者

Vesdapunt, Noranart, Rundle, Mitch, Wu, HsiangTao, Wang, Baoyuan

论文摘要

在本文中,我们介绍了一种新颖的方法,可以使用基于联合的面部钻机和神经皮肤网络学习3D面模型。多亏了基于联合的代表,我们的模型比以前的基于Blendshape的模型具有一些重要的优势。首先,它非常紧凑,因此我们的数量级较小,同时仍保持强大的建模能力。其次,由于每个关节具有其语义含义,因此交互式面部几何编辑变得更加容易,更直观。第三,通过剥皮,我们的模型支持以更简单,更准确和原则性的方式增加嘴巴内部和眼睛,以及配饰(头发,眼镜等)。我们认为,由于人脸是高度结构化的,并且在拓扑上是一致的,因此不需要完全从数据中学习。取而代之的是,我们可以以人为设计的3D面部钻机的形式利用先验知识来减少数据依赖性,并仅从小型数据集(少于一百个3D扫描)中学习紧凑而强的面部模型。为了进一步提高建模能力,我们通过对抗性学习训练皮肤重量产生者。关于拟合高质量3D扫描(中性和表达性),嘈杂的深度图像和RGB图像的实验表明,即使模型小于10至20倍,即使模型较小,它的建模能力也与最先进的面部模型(例如火焰和面部工具)相当。这表明在移动设备和边缘设备上的图形和视觉应用中都有广泛的价值。

In this paper, we introduce a novel approach to learn a 3D face model using a joint-based face rig and a neural skinning network. Thanks to the joint-based representation, our model enjoys some significant advantages over prior blendshape-based models. First, it is very compact such that we are orders of magnitude smaller while still keeping strong modeling capacity. Second, because each joint has its semantic meaning, interactive facial geometry editing is made easier and more intuitive. Third, through skinning, our model supports adding mouth interior and eyes, as well as accessories (hair, eye glasses, etc.) in a simpler, more accurate and principled way. We argue that because the human face is highly structured and topologically consistent, it does not need to be learned entirely from data. Instead we can leverage prior knowledge in the form of a human-designed 3D face rig to reduce the data dependency, and learn a compact yet strong face model from only a small dataset (less than one hundred 3D scans). To further improve the modeling capacity, we train a skinning weight generator through adversarial learning. Experiments on fitting high-quality 3D scans (both neutral and expressive), noisy depth images, and RGB images demonstrate that its modeling capacity is on-par with state-of-the-art face models, such as FLAME and Facewarehouse, even though the model is 10 to 20 times smaller. This suggests broad value in both graphics and vision applications on mobile and edge devices.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源