基于角色的对话生成的模型不足的数据操纵方法

论文标题

基于角色的对话生成的模型不足的数据操纵方法

A Model-Agnostic Data Manipulation Method for Persona-based Dialogue Generation

论文作者

Cao, Yu, Bi, Wei, Fang, Meng, Shi, Shuming, Tao, Dacheng

论文摘要

为了建立聪明的对话代理，人们对在一代模型中引入明确的角色越来越兴趣。但是，凭借有限的基于角色的对话数据，可能很难很好地培训对话生成模型。我们指出，这一代任务的数据挑战在两个方面：首先，扩大当前基于角色的对话数据集是昂贵的。其次，与传统对话数据相比，此任务中的每个数据示例更复杂。为了减轻上述数据问题，我们提出了一种数据操纵方法，该方法是模型不合时宜的，可以包含任何基于角色的对话生成模型，以提高其性能。最初的培训样品将首先进行蒸馏，因此预计将更容易安装。接下来，我们展示各种有效的方法，可以多元化这种更容易的蒸馏数据。然后，将通过构造的数据课程进行培训，即首先是在增强的蒸馏样品上，然后在原始样品上训练。实验说明了我们方法具有两个强大的基础对话模型（变压器编码器编码器和GPT2）的优势。

Towards building intelligent dialogue agents, there has been a growing interest in introducing explicit personas in generation models. However, with limited persona-based dialogue data at hand, it may be difficult to train a dialogue generation model well. We point out that the data challenges of this generation task lie in two aspects: first, it is expensive to scale up current persona-based dialogue datasets; second, each data sample in this task is more complex to learn with than conventional dialogue data. To alleviate the above data issues, we propose a data manipulation method, which is model-agnostic to be packed with any persona-based dialogue generation model to improve its performance. The original training samples will first be distilled and thus expected to be fitted more easily. Next, we show various effective ways that can diversify such easier distilled data. A given base model will then be trained via the constructed data curricula, i.e. first on augmented distilled samples and then on original ones. Experiments illustrate the superiority of our method with two strong base dialogue models (Transformer encoder-decoder and GPT2).

下载PDF全文

下载文献需遵守相关版权规定

论文标题