使用高斯工艺在变异自动编码器中使用高斯过程中的视频序列中的多个功能

论文标题

使用高斯工艺在变异自动编码器中使用高斯过程中的视频序列中的多个功能

Disentangling Multiple Features in Video Sequences using Gaussian Processes in Variational Autoencoders

论文作者

Bhagat, Sarthak, Uppal, Shagun, Yin, Zhuyun, Lim, Nengli

论文摘要

我们介绍了MGP-VAE（多键符 - 功能高斯流程变化自动编码器），这是一种变异自动编码器，使用高斯工艺（GP）来建模视频序列中无用的学习分散表示表示的潜在空间。我们通过建立一个框架来改进以前的工作，通过该框架可以通过该框架进行静态或动态的多个功能。具体而言，我们使用分数布朗尼动作（FBM）和布朗桥（BB）来强制每个独立通道中的框架间相关结构，并表明这种结构可以捕获数据中不同的变化因素。我们通过在三个公开可用数据集上的实验来证明表示表示的质量，并使用视频预测任务量化了改进。此外，我们介绍了一种新型的大地测量损失函数，该功能考虑了数据流的曲率以改善学习。我们的实验表明，改进的表示形式与新型损失函数的结合使MGP-VAE能够在视频预测中优于基准。

We introduce MGP-VAE (Multi-disentangled-features Gaussian Processes Variational AutoEncoder), a variational autoencoder which uses Gaussian processes (GP) to model the latent space for the unsupervised learning of disentangled representations in video sequences. We improve upon previous work by establishing a framework by which multiple features, static or dynamic, can be disentangled. Specifically we use fractional Brownian motions (fBM) and Brownian bridges (BB) to enforce an inter-frame correlation structure in each independent channel, and show that varying this structure enables one to capture different factors of variation in the data. We demonstrate the quality of our representations with experiments on three publicly available datasets, and also quantify the improvement using a video prediction task. Moreover, we introduce a novel geodesic loss function which takes into account the curvature of the data manifold to improve learning. Our experiments show that the combination of the improved representations with the novel loss function enable MGP-VAE to outperform the baselines in video prediction.

下载PDF全文

下载文献需遵守相关版权规定

论文标题