论文标题

深度和条件的代表性方面正常化流量

Representational aspects of depth and conditioning in normalizing flows

论文作者

Koehler, Frederic, Mehta, Viraj, Risteski, Andrej

论文摘要

标准化流是生成建模中最流行的范式之一,尤其是对于图像,主要是因为我们可以有效地评估数据点的可能性。对于评估模型的拟合度以及易于训练的易用性,这是理想的选择,因为可以通过梯度下降来最大程度地提高可能性。但是,训练归一化流也遇到困难:产生好样品的模型通常需要变得非常深 - 随之而来的是消失/爆炸的梯度问题。一个非常相关的问题是,它们的条件通常很差:由于它们被参数化为可逆地图,从$ \ mathbb {r}^d \到\ mathbb {r}^d $,而典型的培训数据(如直观上的图像)是较低少的图像,所以学识渊博的映射通常具有接近奇异的雅各布人。 在我们的论文中,我们解决了围绕深度和归一化流程条件的代表性方面:既适用于一般的可逆体系结构,又用于特定的常见体系结构,即仿射耦合。我们证明,$θ(1)$仿射耦合层足以准确表示置换或$ 1 \ times 1 $卷积,如发光所用,表明在表示分区的选择上不是深度的瓶颈。我们还表明,如果允许进行不良条件,浅层仿射耦合网络是瓦斯恒星距离中的通用近似值,并且实验研究涉及填充的现象。最后,我们显示了一般流量体系结构的深度下限,每层神经元很少,Lipschitz常数很少。

Normalizing flows are among the most popular paradigms in generative modeling, especially for images, primarily because we can efficiently evaluate the likelihood of a data point. This is desirable both for evaluating the fit of a model, and for ease of training, as maximizing the likelihood can be done by gradient descent. However, training normalizing flows comes with difficulties as well: models which produce good samples typically need to be extremely deep -- which comes with accompanying vanishing/exploding gradient problems. A very related problem is that they are often poorly conditioned: since they are parametrized as invertible maps from $\mathbb{R}^d \to \mathbb{R}^d$, and typical training data like images intuitively is lower-dimensional, the learned maps often have Jacobians that are close to being singular. In our paper, we tackle representational aspects around depth and conditioning of normalizing flows: both for general invertible architectures, and for a particular common architecture, affine couplings. We prove that $Θ(1)$ affine coupling layers suffice to exactly represent a permutation or $1 \times 1$ convolution, as used in GLOW, showing that representationally the choice of partition is not a bottleneck for depth. We also show that shallow affine coupling networks are universal approximators in Wasserstein distance if ill-conditioning is allowed, and experimentally investigate related phenomena involving padding. Finally, we show a depth lower bound for general flow architectures with few neurons per layer and bounded Lipschitz constant.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源