Diffustereo：使用稀疏摄像机通过基于扩散的立体声进行高质量的人类重建

论文标题

Diffustereo：使用稀疏摄像机通过基于扩散的立体声进行高质量的人类重建

DiffuStereo: High Quality Human Reconstruction via Diffusion-based Stereo Using Sparse Cameras

论文作者

Shao, Ruizhi, Zheng, Zerong, Zhang, Hongwen, Sun, Jingxiang, Liu, Yebin

论文摘要

我们提出了Diffustereo，这是一种仅使用稀疏相机（在这项工作中8）进行高质量3D人类重建的新型系统。其核心是一种基于新型扩散的立体模块，该模块将扩散模型（一种强大的生成模型）引入迭代立体声匹配网络中。为此，我们设计了一个新的扩散内核和其他立体声限制，以促进网络中的立体声匹配和深度估计。我们进一步提出了一个多级立体声网络体系结构，以处理高分辨率（高达4K）输入而无需无法负担的内存足迹。鉴于人类的一组稀疏视图颜色图像，提出的基于多级扩散的立体声网络可以产生高准确的深度图，然后通过有效的多视图融合策略将其转换为高质量的3D人类模型。总体而言，我们的方法可以自动重建人类模型，其质量在高端密集摄像头钻机上，这是使用更轻巧的硬件设置来实现的。实验表明，我们的方法在定性和定量上都优于最先进的方法。

We propose DiffuStereo, a novel system using only sparse cameras (8 in this work) for high-quality 3D human reconstruction. At its core is a novel diffusion-based stereo module, which introduces diffusion models, a type of powerful generative models, into the iterative stereo matching network. To this end, we design a new diffusion kernel and additional stereo constraints to facilitate stereo matching and depth estimation in the network. We further present a multi-level stereo network architecture to handle high-resolution (up to 4k) inputs without requiring unaffordable memory footprint. Given a set of sparse-view color images of a human, the proposed multi-level diffusion-based stereo network can produce highly accurate depth maps, which are then converted into a high-quality 3D human model through an efficient multi-view fusion strategy. Overall, our method enables automatic reconstruction of human models with quality on par to high-end dense-view camera rigs, and this is achieved using a much more light-weight hardware setup. Experiments show that our method outperforms state-of-the-art methods by a large margin both qualitatively and quantitatively.

下载PDF全文

下载文献需遵守相关版权规定

论文标题