深度和条件的代表性方面正常化流量

论文标题

深度和条件的代表性方面正常化流量

Representational aspects of depth and conditioning in normalizing flows

论文作者

Koehler, Frederic, Mehta, Viraj, Risteski, Andrej

论文摘要

标准化流是生成建模中最流行的范式之一，尤其是对于图像，主要是因为我们可以有效地评估数据点的可能性。对于评估模型的拟合度以及易于训练的易用性，这是理想的选择，因为可以通过梯度下降来最大程度地提高可能性。但是，训练归一化流也遇到困难：产生好样品的模型通常需要变得非常深 - 随之而来的是消失/爆炸的梯度问题。一个非常相关的问题是，它们的条件通常很差：由于它们被参数化为可逆地图，从$ \ mathbb {r}^d \到\ mathbb {r}^d $，而典型的培训数据（如直观上的图像）是较低少的图像，所以学识渊博的映射通常具有接近奇异的雅各布人。在我们的论文中，我们解决了围绕深度和归一化流程条件的代表性方面：既适用于一般的可逆体系结构，又用于特定的常见体系结构，即仿射耦合。我们证明，$θ（1）$仿射耦合层足以准确表示置换或$ 1 \ times 1 $卷积，如发光所用，表明在表示分区的选择上不是深度的瓶颈。我们还表明，如果允许进行不良条件，浅层仿射耦合网络是瓦斯恒星距离中的通用近似值，并且实验研究涉及填充的现象。最后，我们显示了一般流量体系结构的深度下限，每层神经元很少，Lipschitz常数很少。

Normalizing flows are among the most popular paradigms in generative modeling, especially for images, primarily because we can efficiently evaluate the likelihood of a data point. This is desirable both for evaluating the fit of a model, and for ease of training, as maximizing the likelihood can be done by gradient descent. However, training normalizing flows comes with difficulties as well: models which produce good samples typically need to be extremely deep -- which comes with accompanying vanishing/exploding gradient problems. A very related problem is that they are often poorly conditioned: since they are parametrized as invertible maps from $\mathbb{R}^d \to \mathbb{R}^d$, and typical training data like images intuitively is lower-dimensional, the learned maps often have Jacobians that are close to being singular. In our paper, we tackle representational aspects around depth and conditioning of normalizing flows: both for general invertible architectures, and for a particular common architecture, affine couplings. We prove that $Θ(1)$ affine coupling layers suffice to exactly represent a permutation or $1 \times 1$ convolution, as used in GLOW, showing that representationally the choice of partition is not a bottleneck for depth. We also show that shallow affine coupling networks are universal approximators in Wasserstein distance if ill-conditioning is allowed, and experimentally investigate related phenomena involving padding. Finally, we show a depth lower bound for general flow architectures with few neurons per layer and bounded Lipschitz constant.

下载PDF全文

下载文献需遵守相关版权规定

论文标题