Blockgan：从未标记的图像中学习3D对象感知的场景表示

论文标题

Blockgan：从未标记的图像中学习3D对象感知的场景表示

BlockGAN: Learning 3D Object-aware Scene Representations from Unlabelled Images

论文作者

Nguyen-Phuoc, Thu, Richardt, Christian, Mai, Long, Yang, Yong-Liang, Mitra, Niloy

论文摘要

我们提出了Blockgan，这是一种图像生成模型，该模型直接从未标记的2D图像中学习对象感知的3D场景表示。现场表示学习的当前工作要么忽略场景背景，要么将整个场景视为一个对象。同时，考虑场景组成性的工作仅将场景对象视为带有alpha图的图像补丁或2D层。受到计算机图形管道的启发，我们设计了Blockgan，以学习首先生成背景和前景对象的3D功能，然后将它们组合为批发的3D功能，最后将它们渲染为逼真的图像。这使Blockgan可以通过对象的外观（例如阴影和照明）之间的遮挡和相互作用进行推理，并在维护图像现实主义的同时，可以控制每个对象的3D姿势和身份。 Blockgan受过训练的端到端，仅使用未标记的单个图像，而无需3D几何，姿势标签，对象掩码或同一场景的多个视图。我们的实验表明，使用显式3D特征表示对象，Blockgan可以以对象（前景和背景）及其属性（姿势和身份）来学习分离的表示形式。

We present BlockGAN, an image generative model that learns object-aware 3D scene representations directly from unlabelled 2D images. Current work on scene representation learning either ignores scene background or treats the whole scene as one object. Meanwhile, work that considers scene compositionality treats scene objects only as image patches or 2D layers with alpha maps. Inspired by the computer graphics pipeline, we design BlockGAN to learn to first generate 3D features of background and foreground objects, then combine them into 3D features for the wholes cene, and finally render them into realistic images. This allows BlockGAN to reason over occlusion and interaction between objects' appearance, such as shadow and lighting, and provides control over each object's 3D pose and identity, while maintaining image realism. BlockGAN is trained end-to-end, using only unlabelled single images, without the need for 3D geometry, pose labels, object masks, or multiple views of the same scene. Our experiments show that using explicit 3D features to represent objects allows BlockGAN to learn disentangled representations both in terms of objects (foreground and background) and their properties (pose and identity).

下载PDF全文

下载文献需遵守相关版权规定

论文标题