自然重新加权唤醒

论文标题

自然重新加权唤醒

Natural Reweighted Wake-Sleep

论文作者

Várady, Csongor, Volpi, Riccardo, Malagò, Luigi, Ay, Nihat

论文摘要

Helmholtz机器（HMS）是由两个Sigmoid信念网络（SBN）组成的一类生成模型，分别用作编码器和解码器。这些模型通常是使用称为唤醒 - 睡眠（WS）的两步优化算法对这些模型进行训练的，并且最近通过改进的版本（例如重新恢复的Wake-Sleep（RWS）和双向Helmholtz Machines（BIHM））进行了改进版本。 SBN中连接的局部性在与概率模型相关的Fisher信息矩阵中引起稀疏性，并以细粒粒度的块状结构的形式引起。在本文中，我们利用自然梯度利用该特性来有效地训练SBN和HMS。我们提出了一种新颖的算法，称为“自然重新唤醒”（NRWS），该算法与其标准版本的几何适应相对应。以类似的方式，我们还引入了天然双向Helmholtz机器（NBIHM）。与以前的工作不同，我们将展示如何有效地计算自然梯度，而无需引入Fisher信息矩阵结构的任何近似值。在文献中进行的标准数据集进行的实验表明，NRW和NBIHM不仅在其非几何基准方面，而且在HMS的最新培训算法方面都具有一致的改善。训练后达到的速度以及对数可能达到的对数似然的价值来量化改进。

Helmholtz Machines (HMs) are a class of generative models composed of two Sigmoid Belief Networks (SBNs), acting respectively as an encoder and a decoder. These models are commonly trained using a two-step optimization algorithm called Wake-Sleep (WS) and more recently by improved versions, such as Reweighted Wake-Sleep (RWS) and Bidirectional Helmholtz Machines (BiHM). The locality of the connections in an SBN induces sparsity in the Fisher Information Matrices associated to the probabilistic models, in the form of a finely-grained block-diagonal structure. In this paper we exploit this property to efficiently train SBNs and HMs using the natural gradient. We present a novel algorithm, called Natural Reweighted Wake-Sleep (NRWS), that corresponds to the geometric adaptation of its standard version. In a similar manner, we also introduce Natural Bidirectional Helmholtz Machine (NBiHM). Differently from previous work, we will show how for HMs the natural gradient can be efficiently computed without the need of introducing any approximation in the structure of the Fisher information matrix. The experiments performed on standard datasets from the literature show a consistent improvement of NRWS and NBiHM not only with respect to their non-geometric baselines but also with respect to state-of-the-art training algorithms for HMs. The improvement is quantified both in terms of speed of convergence as well as value of the log-likelihood reached after training.

下载PDF全文

下载文献需遵守相关版权规定

论文标题