从核心对象识别的角度，可以更深入地了解$β$ -VAE的无人监督的学习

论文标题

从核心对象识别的角度，可以更深入地了解$β$ -VAE的无人监督的学习

A Deeper Look at the Unsupervised Learning of Disentangled Representations in $β$-VAE from the Perspective of Core Object Recognition

论文作者

Sikka, Harshvardhan

论文摘要

尽管外观存在差异（称为核心对象识别），但识别对象的能力构成了人类感知的关键部分。虽然可以理解，大脑通过视觉流通过进料，层次计算来实现核心对象识别，但仍未很好地理解允许不变表示下游形成不变表示的基础算法。（DiCarlo等，2012）已经构建了各种计算感知模型，以尝试并在人工感知环境中处理对象识别任务。人工神经网络，由加权边缘和数学操作组成的计算图，受到大脑中神经网络的启发，并在各种视觉感知任务（包括对象表征和识别）中被证明有效。（Pinto等，2008）（DiCarlo等，2012）对于许多数据分析任务，每个维度在统计上都是独立的，因此与其他维度分解是有用的。如果数据的潜在生成因子在统计上是独立的，则潜在变量的贝叶斯推断可以形成分离的表示。该论文构成了一个研究项目，该研究项目探讨了$β$ -VAE的变异自动编码器（VAE）的概括，该项目旨在使用变分推断学习分离的表示。 $β$ -VAE结合了超参数$β$，并实现其瓶颈神经元的条件独立性，这通常与潜在变量的统计独立性不符。本文研究了该体系结构，并提供了分析和数值论点，目的是证明这种不兼容性会导致$β$ -VAE的非单调推理性能，并具有有限的最佳$β$。

The ability to recognize objects despite there being differences in appearance, known as Core Object Recognition, forms a critical part of human perception. While it is understood that the brain accomplishes Core Object Recognition through feedforward, hierarchical computations through the visual stream, the underlying algorithms that allow for invariant representations to form downstream is still not well understood. (DiCarlo et al., 2012) Various computational perceptual models have been built to attempt and tackle the object identification task in an artificial perceptual setting. Artificial Neural Networks, computational graphs consisting of weighted edges and mathematical operations at vertices, are loosely inspired by neural networks in the brain and have proven effective at various visual perceptual tasks, including object characterization and identification. (Pinto et al., 2008) (DiCarlo et al., 2012) For many data analysis tasks, learning representations where each dimension is statistically independent and thus disentangled from the others is useful. If the underlying generative factors of the data are also statistically independent, Bayesian inference of latent variables can form disentangled representations. This thesis constitutes a research project exploring a generalization of the Variational Autoencoder (VAE), $β$-VAE, that aims to learn disentangled representations using variational inference. $β$-VAE incorporates the hyperparameter $β$, and enforces conditional independence of its bottleneck neurons, which is in general not compatible with the statistical independence of latent variables. This text examines this architecture, and provides analytical and numerical arguments, with the goal of demonstrating that this incompatibility leads to a non-monotonic inference performance in $β$-VAE with a finite optimal $β$.

下载PDF全文

下载文献需遵守相关版权规定

论文标题