论文标题
对长篇文本生成的模型批评
Model Criticism for Long-Form Text Generation
论文作者
论文摘要
语言模型已经证明了产生高流利文本的能力。但是,目前尚不清楚其产出是否保留了连贯的高级结构(例如,故事进展)。在这里,我们建议应用一种统计工具,即潜在空间中的模型批评,以评估生成的文本的高级结构。模型批评比较了根据假设生成过程获得的潜在空间中真实数据和生成数据之间的分布。不同的生成过程确定了基础模型的特定故障模式。我们对高级话语的三个代表性方面进行实验 - 连贯性,核心和局部性 - 并发现基于变压器的语言模型能够捕获主题结构,但很难维持结构连贯性或建模核心。
Language models have demonstrated the ability to generate highly fluent text; however, it remains unclear whether their output retains coherent high-level structure (e.g., story progression). Here, we propose to apply a statistical tool, model criticism in latent space, to evaluate the high-level structure of the generated text. Model criticism compares the distributions between real and generated data in a latent space obtained according to an assumptive generative process. Different generative processes identify specific failure modes of the underlying model. We perform experiments on three representative aspects of high-level discourse -- coherence, coreference, and topicality -- and find that transformer-based language models are able to capture topical structures but have a harder time maintaining structural coherence or modeling coreference.