与变压器的顺序视图合成

论文标题

与变压器的顺序视图合成

Sequential View Synthesis with Transformer

论文作者

Nguyen-Ha, Phong, Huynh, Lam, Rahtu, Esa, Heikkila, Janne

论文摘要

本文通过神经渲染解决了新型视图综合的问题，我们有兴趣根据来自其他观点的一组输入图像来预测任意摄像机姿势的新型视图。使用已知的查询姿势和输入姿势，我们创建了一组有序的观测值，该观测值会导致目标视图。因此，单个新型视图综合的问题被重新重新构成了一个顺序视图预测任务。在本文中，提出的基于变压器的生成查询网络（T-GQN）通过添加两个新概念扩展了神经渲染方法。首先，我们使用上下文图像之间的多视图注意学习来获得多个隐式场景表示。其次，我们引入了一个顺序渲染解码器，以根据学习的表示形式预测图像序列，包括目标视图。最后，我们在各种具有挑战性的数据集上评估了我们的模型，并证明我们的模型不仅给出了一致的预测，而且不需要对填充进行任何重新训练。

This paper addresses the problem of novel view synthesis by means of neural rendering, where we are interested in predicting the novel view at an arbitrary camera pose based on a given set of input images from other viewpoints. Using the known query pose and input poses, we create an ordered set of observations that leads to the target view. Thus, the problem of single novel view synthesis is reformulated as a sequential view prediction task. In this paper, the proposed Transformer-based Generative Query Network (T-GQN) extends the neural-rendering methods by adding two new concepts. First, we use multi-view attention learning between context images to obtain multiple implicit scene representations. Second, we introduce a sequential rendering decoder to predict an image sequence, including the target view, based on the learned representations. Finally, we evaluate our model on various challenging datasets and demonstrate that our model not only gives consistent predictions but also doesn't require any retraining for finetuning.

下载PDF全文

下载文献需遵守相关版权规定

论文标题