无监督的文本样式转移的概率表述

论文标题

无监督的文本样式转移的概率表述

A Probabilistic Formulation of Unsupervised Text Style Transfer

论文作者

He, Junxian, Wang, Xinyi, Neubig, Graham, Berg-Kirkpatrick, Taylor

论文摘要

我们为无监督的文本样式转移提供了一个深层的生成模型，该模型统一了先前提出的非生成技术。我们的概率方法将来自两个域的非平行数据模拟为部分观察到的并行语料库。通过假设生成每个观察到的序列的平行潜在序列，我们的模型学会以完全无监督的方式将序列从一个域转换为另一个域。与传统的生成序列模型（例如HMM）相反，我们的模型对其生成的数据做出了很少的假设：它使用经常性的语言模型作为先验和编码器描述器作为转导分布。尽管在此模型类别中，边际数据可能性的计算是棘手的，但我们表明，摊销的变异推理允许实用的替代物。此外，通过在我们的变异目标和其他最新的无监督样式转移和机器翻译技术之间建立联系，我们展示了我们的概率观点如何统一一些已知的非生成目标，例如倒退和对抗性损失。最后，我们证明了我们方法在各种无监督的风格转移任务上的有效性，包括情感转移，形式转移，单词解密，作者模仿和相关语言翻译。在所有样式转移任务中，我们的方法对最先进的非生产基线的基本线取得了可观的收益，包括我们方法概括的最先进的无监督的机器翻译技术。此外，我们对标准无监督的机器翻译任务进行实验，发现我们的统一方法与当前的最新方法相匹配。

We present a deep generative model for unsupervised text style transfer that unifies previously proposed non-generative techniques. Our probabilistic approach models non-parallel data from two domains as a partially observed parallel corpus. By hypothesizing a parallel latent sequence that generates each observed sequence, our model learns to transform sequences from one domain to another in a completely unsupervised fashion. In contrast with traditional generative sequence models (e.g. the HMM), our model makes few assumptions about the data it generates: it uses a recurrent language model as a prior and an encoder-decoder as a transduction distribution. While computation of marginal data likelihood is intractable in this model class, we show that amortized variational inference admits a practical surrogate. Further, by drawing connections between our variational objective and other recent unsupervised style transfer and machine translation techniques, we show how our probabilistic view can unify some known non-generative objectives such as backtranslation and adversarial loss. Finally, we demonstrate the effectiveness of our method on a wide range of unsupervised style transfer tasks, including sentiment transfer, formality transfer, word decipherment, author imitation, and related language translation. Across all style transfer tasks, our approach yields substantial gains over state-of-the-art non-generative baselines, including the state-of-the-art unsupervised machine translation techniques that our approach generalizes. Further, we conduct experiments on a standard unsupervised machine translation task and find that our unified approach matches the current state-of-the-art.

下载PDF全文

下载文献需遵守相关版权规定

论文标题