论文标题
视频推断在时空中
Video Extrapolation in Space and Time
论文作者
论文摘要
新颖的视图合成(NVS)和视频预测(VP)通常被视为计算机视觉中的脱节任务。但是,它们既可以看作是观察空间范围的世界的方法:NVS的目的是从新的角度综合一个场景,而副总裁则旨在从新的时间点观看场景。这两个任务提供了互补的信号以获得场景表示形式,因为观点从空间观察中变化为深度,并且时间观察为相机和单个对象的运动提供了信息。受这些观察的启发,我们建议研究时空中的视频外推(背心)。我们提出了一个模型,该模型利用了两项任务的自学和互补线索,而现有方法只能解决其中的一个。实验表明,我们的方法比室内和室外现实世界数据集上的几种最先进的NVS和VP方法更好或可比。
Novel view synthesis (NVS) and video prediction (VP) are typically considered disjoint tasks in computer vision. However, they can both be seen as ways to observe the spatial-temporal world: NVS aims to synthesize a scene from a new point of view, while VP aims to see a scene from a new point of time. These two tasks provide complementary signals to obtain a scene representation, as viewpoint changes from spatial observations inform depth, and temporal observations inform the motion of cameras and individual objects. Inspired by these observations, we propose to study the problem of Video Extrapolation in Space and Time (VEST). We propose a model that leverages the self-supervision and the complementary cues from both tasks, while existing methods can only solve one of them. Experiments show that our method achieves performance better than or comparable to several state-of-the-art NVS and VP methods on indoor and outdoor real-world datasets.