结构和运动的随机视频预测

论文标题

结构和运动的随机视频预测

Stochastic Video Prediction with Structure and Motion

论文作者

Akan, Adil Kaan, Safadoust, Sadra, Güney, Fatma

论文摘要

尽管随机视频预测模型可在不确定性下实现未来的预测，但它们大多无法对现实场景的复杂动态进行建模。例如，在驾驶场景中，他们无法为具有移动相机的场景和独立移动前景对象提供可靠的预测。现有方法仅通过专注于像素的变化而无法完全捕获结构化世界的动态。在本文中，我们假设存在一个基本过程，从而在视频中创建观察值，并提议将其分解为静态和动态组件。我们根据场景结构和车辆的自我运动对静态部分进行建模，并根据动态对象的剩余运动进行动态部分。通过学习前景和背景中变化的单独分布，我们可以将场景分解为静态和动态部分，并分别对每个变化进行建模。我们的实验表明，解开结构和运动有助于随机视频预测，从而在两个现实世界驾驶数据集Kitti和CityScapes的复杂驾驶方案中获得更好的未来预测。

While stochastic video prediction models enable future prediction under uncertainty, they mostly fail to model the complex dynamics of real-world scenes. For example, they cannot provide reliable predictions for scenes with a moving camera and independently moving foreground objects in driving scenarios. The existing methods fail to fully capture the dynamics of the structured world by only focusing on changes in pixels. In this paper, we assume that there is an underlying process creating observations in a video and propose to factorize it into static and dynamic components. We model the static part based on the scene structure and the ego-motion of the vehicle, and the dynamic part based on the remaining motion of the dynamic objects. By learning separate distributions of changes in foreground and background, we can decompose the scene into static and dynamic parts and separately model the change in each. Our experiments demonstrate that disentangling structure and motion helps stochastic video prediction, leading to better future predictions in complex driving scenarios on two real-world driving datasets, KITTI and Cityscapes.

下载PDF全文

下载文献需遵守相关版权规定

论文标题