Stylizednerf：通过2D-3D共同学习，一致的3D场景风格为模式化的NERF

论文标题

Stylizednerf：通过2D-3D共同学习，一致的3D场景风格为模式化的NERF

StylizedNeRF: Consistent 3D Scene Stylization as Stylized NeRF via 2D-3D Mutual Learning

论文作者

Huang, Yi-Hua, He, Yue, Yuan, Yu-Jie, Lai, Yu-Kun, Gao, Lin

论文摘要

3D场景风格旨在从任意的小说视图中生成场景的风格化图像，遵循给定的一组样式示例，同时确保从不同视图呈现时保持一致性。直接将图像或视频风格化方法应用于3D场景无法达到这种一致性。多亏了最近提出的神经辐射场（NERF），我们能够以一致的方式代表一个3D场景。一致的3D场景样式可以通过对相应的NERF进行样式来有效地实现。但是，在2D图像和NERF的样式示例之间存在着重要的域间隙，这是隐式体积表示。为了解决这个问题，我们为3D场景风格提出了一个新颖的相互学习框架，该框架结合了2D图像样式网络和NERF，以将2D模式网络与NERF的3D一致性融合在一起。我们首先将3D场景的标准NERF预先训练，并用样式网络替换其颜色预测模块，以获得风格化的NERF。其次是通过引入的一致性损失将空间一致性从NERF从NERF提炼到2D模式化网络。我们还引入了模仿损失，以监督NERF样式模块的相互学习并微调2D定型解码器。为了进一步使我们的模型处理2D风格化结果的模棱两可，我们引入了可学习的潜在代码，以遵守以该样式为条件的概率分布。它们与培训样本有关，作为有条件输入，以更好地学习我们新颖的式NERF中的样式模块。实验结果表明，我们的方法优于视觉质量和远程一致性的现有方法。

3D scene stylization aims at generating stylized images of the scene from arbitrary novel views following a given set of style examples, while ensuring consistency when rendered from different views. Directly applying methods for image or video stylization to 3D scenes cannot achieve such consistency. Thanks to recently proposed neural radiance fields (NeRF), we are able to represent a 3D scene in a consistent way. Consistent 3D scene stylization can be effectively achieved by stylizing the corresponding NeRF. However, there is a significant domain gap between style examples which are 2D images and NeRF which is an implicit volumetric representation. To address this problem, we propose a novel mutual learning framework for 3D scene stylization that combines a 2D image stylization network and NeRF to fuse the stylization ability of 2D stylization network with the 3D consistency of NeRF. We first pre-train a standard NeRF of the 3D scene to be stylized and replace its color prediction module with a style network to obtain a stylized NeRF. It is followed by distilling the prior knowledge of spatial consistency from NeRF to the 2D stylization network through an introduced consistency loss. We also introduce a mimic loss to supervise the mutual learning of the NeRF style module and fine-tune the 2D stylization decoder. In order to further make our model handle ambiguities of 2D stylization results, we introduce learnable latent codes that obey the probability distributions conditioned on the style. They are attached to training samples as conditional inputs to better learn the style module in our novel stylized NeRF. Experimental results demonstrate that our method is superior to existing approaches in both visual quality and long-range consistency.

下载PDF全文

下载文献需遵守相关版权规定

论文标题