单眼实时体积性能捕获

论文标题

单眼实时体积性能捕获

Monocular Real-Time Volumetric Performance Capture

论文作者

Li, Ruilong, Xiu, Yuliang, Saito, Shunsuke, Huang, Zeng, Olszewski, Kyle, Li, Hao

论文摘要

我们提出了第一种从单眼视频实时速度捕获体积性能捕获和新型视图渲染的方法，从而消除了对个性化模板模型的昂贵多视图系统或繁琐的预付款的需求。我们的系统通过利用像素一致的隐式函数（PIFU）从每个帧中重建了完全纹理的3D人。尽管PIFU以内存有效的方式实现了高分辨率重建，但其计算昂贵的推断阻止了我们为实时应用程序部署这样的系统。为此，我们提出了一种新型的分层表面定位算法和一种直接渲染方法，而无需明确提取表面网格。通过以粗到最新的方式淘汰不必要的区域以进行评估，我们通过基线从基线中成功加速了两个数量级的重建，而不会损害质量。此外，我们引入了一种在线硬示例挖掘（OHEM）技术，该技术由于艰难的例子而有效地抑制了故障模式。我们基于当前的重建精度自适应地更新训练数据的采样概率，从而有效地减轻了重建工子。我们的实验和评估证明了系统对各种具有挑战性的角度，照明，姿势和服装风格的鲁棒性。我们还表明，我们的方法与最先进的单眼绩效捕获相比有利。我们提出的方法消除了对多视图工作室设置的需求，并实现了可消费者访问的解决方案以进行体积捕获。

We present the first approach to volumetric performance capture and novel-view rendering at real-time speed from monocular video, eliminating the need for expensive multi-view systems or cumbersome pre-acquisition of a personalized template model. Our system reconstructs a fully textured 3D human from each frame by leveraging Pixel-Aligned Implicit Function (PIFu). While PIFu achieves high-resolution reconstruction in a memory-efficient manner, its computationally expensive inference prevents us from deploying such a system for real-time applications. To this end, we propose a novel hierarchical surface localization algorithm and a direct rendering method without explicitly extracting surface meshes. By culling unnecessary regions for evaluation in a coarse-to-fine manner, we successfully accelerate the reconstruction by two orders of magnitude from the baseline without compromising the quality. Furthermore, we introduce an Online Hard Example Mining (OHEM) technique that effectively suppresses failure modes due to the rare occurrence of challenging examples. We adaptively update the sampling probability of the training data based on the current reconstruction accuracy, which effectively alleviates reconstruction artifacts. Our experiments and evaluations demonstrate the robustness of our system to various challenging angles, illuminations, poses, and clothing styles. We also show that our approach compares favorably with the state-of-the-art monocular performance capture. Our proposed approach removes the need for multi-view studio settings and enables a consumer-accessible solution for volumetric capture.

下载PDF全文

下载文献需遵守相关版权规定

论文标题