论文标题

无监督的意见总结,陈述和降解

Unsupervised Opinion Summarization with Noising and Denoising

论文作者

Amplayo, Reinald Kim, Lapata, Mirella

论文摘要

对包含数十万个文档苏格尔对的大型数据集的高容量模型的监督培训对于最近的深度学习技术在抽象性摘要中取得的成功至关重要。不幸的是,在大多数领域(新闻除外)中,此类培训数据不可用,也无法轻易采购。在本文中,我们可以在没有可用的文档(例如〜产品或业务评论)的情况下使用监督的学习。我们通过抽样评论来从用户评论的语料库中创建一个合成数据集,假装它是一个摘要,并生成了噪声版本,我们将其视为伪评论输入。我们介绍了几种以语言动机的噪声产生功能和一个汇总模型,该模型学会了来证明输入并生成原始评论。在测试时,该模型接受真实的评论,并产生包含显着意见的摘要,将那些未达成共识的人视为噪音。广泛的自动和人类评估表明,我们的模型对抽象性基准和提取基线都有实质性改进。

The supervised training of high-capacity models on large datasets containing hundreds of thousands of document-summary pairs is critical to the recent success of deep learning techniques for abstractive summarization. Unfortunately, in most domains (other than news) such training data is not available and cannot be easily sourced. In this paper we enable the use of supervised learning for the setting where there are only documents available (e.g.,~product or business reviews) without ground truth summaries. We create a synthetic dataset from a corpus of user reviews by sampling a review, pretending it is a summary, and generating noisy versions thereof which we treat as pseudo-review input. We introduce several linguistically motivated noise generation functions and a summarization model which learns to denoise the input and generate the original review. At test time, the model accepts genuine reviews and generates a summary containing salient opinions, treating those that do not reach consensus as noise. Extensive automatic and human evaluation shows that our model brings substantial improvements over both abstractive and extractive baselines.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源