平衡开放域响应的多域语料库学习

论文标题

平衡开放域响应的多域语料库学习

Balancing Multi-Domain Corpora Learning for Open-Domain Response Generation

论文作者

Xing, Yujie, Cai, Jinglun, Barlaug, Nils, Liu, Peng, Gulla, Jon Atle

论文摘要

假定开放域对话系统可以在多个域上产生同样好的响应。以前的工作在单个语料库上取得了良好的表现，但是对来自不同领域的多个语料库进行培训和评估的研究较少。本文探讨了为多个多域语料库中的每一个都产生相关响应的方法。我们首先研究了将多个语料库作为基准的交织学习。然后，我们研究了两种多域学习方法，标记为学习和多任务标记的学习，它们通过独特的语料库嵌入来编码每个语料库。此外，我们提出了特定领域的频率（DF），这是一种新颖的单词级重要权重，与其他语料库相比，衡量单词对特定语料库的相对重要性。基于DF，我们提出了加权学习，该方法将DF集成到损失函数。我们还采用DF作为新的评估指标。广泛的实验表明，我们的方法在自动评估和人类评估方面都有显着改善。我们共享我们的代码和数据以获得可重复性

Open-domain conversational systems are assumed to generate equally good responses on multiple domains. Previous work achieved good performance on the single corpus, but training and evaluating on multiple corpora from different domains are less studied. This paper explores methods of generating relevant responses for each of multiple multi-domain corpora. We first examine interleaved learning which intermingles multiple corpora as the baseline. We then investigate two multi-domain learning methods, labeled learning and multi-task labeled learning, which encode each corpus through a unique corpus embedding. Furthermore, we propose Domain-specific Frequency (DF), a novel word-level importance weight that measures the relative importance of a word for a specific corpus compared to other corpora. Based on DF, we propose weighted learning, a method that integrates DF to the loss function. We also adopt DF as a new evaluation metric. Extensive experiments show that our methods gain significant improvements on both automatic and human evaluation. We share our code and data for reproducibility

下载PDF全文

下载文献需遵守相关版权规定

论文标题