在房间外看：从单个图像中综合一个一致的长期3D场景视频

论文标题

在房间外看：从单个图像中综合一个一致的长期3D场景视频

Look Outside the Room: Synthesizing A Consistent Long-Term 3D Scene Video from A Single Image

论文作者

Ren, Xuanchi, Wang, Xiaolong

论文摘要

来自单个图像的新型视图综合最近引起了很多关注，它主要是通过3D深度学习和渲染技术提出的。但是，大多数工作仍然受到相对较小的相机动作中新视图的综合限制。在本文中，我们提出了一种新颖的方法，以合成一个始终如一的长期视频，并有一个大型相机运动的轨迹。我们的方法利用自动回旋变压器对多个帧进行顺序建模，这是多个帧与相应摄像机之间的关系以预测下一帧。为了促进学习并确保生成的帧之间的一致性，我们基于输入摄像机引入了一个局部性约束，以指导在空间和时间之间进行大量贴片之间的自我注意。我们的方法的表现优于最先进的视图综合方法，尤其是在室内3D场景中长期未来的综合时。 https://xrenaa.github.io/look-outside-room/的项目页面。

Novel view synthesis from a single image has recently attracted a lot of attention, and it has been primarily advanced by 3D deep learning and rendering techniques. However, most work is still limited by synthesizing new views within relatively small camera motions. In this paper, we propose a novel approach to synthesize a consistent long-term video given a single scene image and a trajectory of large camera motions. Our approach utilizes an autoregressive Transformer to perform sequential modeling of multiple frames, which reasons the relations between multiple frames and the corresponding cameras to predict the next frame. To facilitate learning and ensure consistency among generated frames, we introduce a locality constraint based on the input cameras to guide self-attention among a large number of patches across space and time. Our method outperforms state-of-the-art view synthesis approaches by a large margin, especially when synthesizing long-term future in indoor 3D scenes. Project page at https://xrenaa.github.io/look-outside-room/.

下载PDF全文

下载文献需遵守相关版权规定

论文标题