从野外的单个图像中感知3D人体对象空间排列

论文标题

从野外的单个图像中感知3D人体对象空间排列

Perceiving 3D Human-Object Spatial Arrangements from a Single Image in the Wild

论文作者

Zhang, Jason Y., Pepose, Sam, Joo, Hanbyul, Ramanan, Deva, Malik, Jitendra, Kanazawa, Angjoo

论文摘要

我们提出了一种方法，可以在全球一致的3D场景中渗透人类和物体的空间排列和形状，这都是从在不受控制的环境中捕获的单个图像中的。值得注意的是，我们的方法在没有任何场景或对象级3D监督的情况下运行。我们的主要见解是，考虑人类和物体共同产生可用于解决歧义的“ 3D常识”约束。特别是，我们引入了一个规模损失，该量损失从数据中学习了对象大小的分布；咬合感知的轮廓重新投射损失，以优化对象姿势；以及人类相互作用的人类相互作用损失，以捕获与人类相互作用的对象的空间布局。我们从经验上验证我们的约束大大减少了可能的3D空间配置的空间。我们展示了我们与大型物体（例如自行车，摩托车和冲浪板）和手持式物体（例如笔记本电脑，网球球拍和滑板）相互作用的挑战，野外图像的方法。我们量化了我们方法恢复人类对象安排的能力并概述了这个相对领域的剩余挑战。可以在https://jasonyzhang.com/phosa上找到该项目网页。

We present a method that infers spatial arrangements and shapes of humans and objects in a globally consistent 3D scene, all from a single image in-the-wild captured in an uncontrolled environment. Notably, our method runs on datasets without any scene- or object-level 3D supervision. Our key insight is that considering humans and objects jointly gives rise to "3D common sense" constraints that can be used to resolve ambiguity. In particular, we introduce a scale loss that learns the distribution of object size from data; an occlusion-aware silhouette re-projection loss to optimize object pose; and a human-object interaction loss to capture the spatial layout of objects with which humans interact. We empirically validate that our constraints dramatically reduce the space of likely 3D spatial configurations. We demonstrate our approach on challenging, in-the-wild images of humans interacting with large objects (such as bicycles, motorcycles, and surfboards) and handheld objects (such as laptops, tennis rackets, and skateboards). We quantify the ability of our approach to recover human-object arrangements and outline remaining challenges in this relatively domain. The project webpage can be found at https://jasonyzhang.com/phosa.

下载PDF全文

下载文献需遵守相关版权规定

论文标题