论文标题
文本生成的深层潜伏模型
Deep Latent-Variable Models for Text Generation
论文作者
论文摘要
文本生成旨在为下游任务产生类似人类的自然语言输出。它涵盖了广泛的应用程序,例如机器翻译,文档摘要,对话生成等。最近,基于神经网络的深度端到端体系结构已被广泛采用。端到端方法将过去由复杂的手工规则设计的所有子模块混合在一起,将其融合到整体的编码模式体系结构中。鉴于足够的培训数据,它能够实现最先进的表现,但避免了对语言/领域依赖性知识的需求。尽管如此,深度学习模型被认为是极其渴望的,并且从中产生的文本通常遭受低多样性,可解释性和可控性的影响。结果,很难在现实生活应用中信任它们的输出。深层潜伏模型通过在中间潜在过程中指定概率分布,提供了一种解决这些问题的潜在方法,同时保持深神经网络的表达能力。本文提出了对文本生成的标准编码器模型的深层可变量模型可以改善的深度模型。
Text generation aims to produce human-like natural language output for down-stream tasks. It covers a wide range of applications like machine translation, document summarization, dialogue generation and so on. Recently deep neural network-based end-to-end architectures have been widely adopted. The end-to-end approach conflates all sub-modules, which used to be designed by complex handcrafted rules, into a holistic encode-decode architecture. Given enough training data, it is able to achieve state-of-the-art performance yet avoiding the need of language/domain-dependent knowledge. Nonetheless, deep learning models are known to be extremely data-hungry, and text generated from them usually suffer from low diversity, interpretability and controllability. As a result, it is difficult to trust the output from them in real-life applications. Deep latent-variable models, by specifying the probabilistic distribution over an intermediate latent process, provide a potential way of addressing these problems while maintaining the expressive power of deep neural networks. This dissertation presents how deep latent-variable models can improve over the standard encoder-decoder model for text generation.