论文标题

推文到新闻转换:对无监督的可控文本生成的调查

Tweet to News Conversion: An Investigation into Unsupervised Controllable Text Generation

论文作者

Ahmad, Zishan, S, Mukuntha N, Ekbal, Asif, Bhattacharyya, Pushpak

论文摘要

文本生成器系统在最近的深度学习模型(例如编码器)的出现中变得非常受欢迎。在不监督的情况下控制生成的输出的信息和样式是一项重要且具有挑战性的自然语言处理(NLP)任务。在本文中,我们定义了从一组灾难域推文中构建连贯段落的任务,而无需任何并行数据。我们通过在管道中构建两个系统来解决问题。第一个系统专注于无监督的样式转移,并将各个推文转换为新闻句子。第二个系统将第一个系统的输出缝合在一起,形成一个连贯的新闻段落。我们还提出了一种新颖的培训机制,将句子分为命题并训练第二个系统以合并句子。我们创建了一个验证和测试集,该验证集由推文集及其等效的新闻段落组成,以执行经验评估。在完全无监督的环境中,我们的模型能够达到19.32的BLEU分数,同时成功地转移样式并加入推文以形成有意义的新闻段。

Text generator systems have become extremely popular with the advent of recent deep learning models such as encoder-decoder. Controlling the information and style of the generated output without supervision is an important and challenging Natural Language Processing (NLP) task. In this paper, we define the task of constructing a coherent paragraph from a set of disaster domain tweets, without any parallel data. We tackle the problem by building two systems in pipeline. The first system focuses on unsupervised style transfer and converts the individual tweets into news sentences. The second system stitches together the outputs from the first system to form a coherent news paragraph. We also propose a novel training mechanism, by splitting the sentences into propositions and training the second system to merge the sentences. We create a validation and test set consisting of tweet-sets and their equivalent news paragraphs to perform empirical evaluation. In a completely unsupervised setting, our model was able to achieve a BLEU score of 19.32, while successfully transferring styles and joining tweets to form a meaningful news paragraph.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源