对矢量量化瓶颈模型的强大培训

论文标题

对矢量量化瓶颈模型的强大培训

Robust Training of Vector Quantized Bottleneck Models

论文作者

Łańcucki, Adrian, Chorowski, Jan, Sanchez, Guillaume, Marxer, Ricard, Chen, Nanxin, Dolfing, Hans J. G. A., Khurana, Sameer, Alumäe, Tanel, Laurent, Antoine

论文摘要

在本文中，我们演示了使用矢量定量的变分自动编码器模型（VQ-VAE）可靠有效地培训离散表示的方法。已证明离散的潜在变量模型可以学习语音的非平凡表示，适用于无监督的语音转换，并在单位发现任务上达到最先进的性能。对于无监督的表示学习，它们成为连续延迟变量模型（例如变异自动编码器（VAE））的可行替代方案。但是，由于离散操作的固有非差异性，训练深层离散变量模型是具有挑战性的。在本文中，我们关注VQ-VAE，VQ-VAE是一种最新的离散瓶颈模型，该模型与其连续的对应物相同。它使用在线$ K $ -MEANS聚类量化编码器输出。我们表明，代码书学习可能会遭受群集编码器输出的初始化和非平稳性的困扰。我们证明，可以通过提高代码书的学习率和定期日期依赖性代码字重新定位来成功克服这些问题。结果，我们在不同的任务上实现了更强大的培训，即使在大型代码簿中，也可以显着增加潜在代码字的使用。例如，在无监督的代表学习中，这具有实际好处，其中大型代码手册可能导致潜在表示的分解。

In this paper we demonstrate methods for reliable and efficient training of discrete representation using Vector-Quantized Variational Auto-Encoder models (VQ-VAEs). Discrete latent variable models have been shown to learn nontrivial representations of speech, applicable to unsupervised voice conversion and reaching state-of-the-art performance on unit discovery tasks. For unsupervised representation learning, they became viable alternatives to continuous latent variable models such as the Variational Auto-Encoder (VAE). However, training deep discrete variable models is challenging, due to the inherent non-differentiability of the discretization operation. In this paper we focus on VQ-VAE, a state-of-the-art discrete bottleneck model shown to perform on par with its continuous counterparts. It quantizes encoder outputs with on-line $k$-means clustering. We show that the codebook learning can suffer from poor initialization and non-stationarity of clustered encoder outputs. We demonstrate that these can be successfully overcome by increasing the learning rate for the codebook and periodic date-dependent codeword re-initialization. As a result, we achieve more robust training across different tasks, and significantly increase the usage of latent codewords even for large codebooks. This has practical benefit, for instance, in unsupervised representation learning, where large codebooks may lead to disentanglement of latent representations.

下载PDF全文

下载文献需遵守相关版权规定

论文标题