基于预训练的语言模型的开放域对话生成

论文标题

基于预训练的语言模型的开放域对话生成

Open-Domain Dialogue Generation Based on Pre-trained Language Models

论文作者

Zeng, Yan, Nie, Jian-Yun

论文摘要

预训练的语言模型已成功地用于响应生成中进行开放域对话。已经提出了四个主要框架：（1）使用变压器编码器和解码器分别用于源和目标句子；（2）使用变压器解码器用于源句子和目标句子；（3）使用变压器解码器使用掩盖语言模型目标在目标侧的源侧和从左到右的关注的变压器解码器；（4）使用自动回归目标的变压器。在这项研究中，我们比较了3个数据集上的这些框架，我们的比较表明，最佳框架在源侧使用双向关注，并且不会分开编码器和解码器。我们还检查了模型差异，我们的实验证实了模型的性能直接受到潜在差异的影响。然后，我们提出了两种校正方法来减少差异，并提高模型性能。这些结果表明，差异是我们使用预训练模型时要考虑的重要因素，而差异的降低可以改善性能。

Pre-trained language models have been successfully used in response generation for open-domain dialogue. Four main frameworks have been proposed: (1) Transformer-ED using Transformer encoder and decoder separately for source and target sentences; (2) Transformer-Dec using Transformer decoder for both source and target sentences; (3) Transformer-MLM using Transformer decoder that applies bi-directional attention on the source side and left-to-right attention on the target side with masked language model objective; and (4) Transformer-AR that uses auto-regressive objective instead. In this study, we compare these frameworks on 3 datasets, and our comparison reveals that the best framework uses bidirectional attention on the source side and does not separate encoder and decoder. We also examine model discrepancy, and our experiments confirm that the performance of a model is directly impacted by the underlying discrepancies. We then propose two correction methods to reduce the discrepancies, and both improve the model performance. These results show that discrepancies is an important factor to consider when we use a pre-trained model, and a reduction in discrepancies can lead to improved performance.

下载PDF全文

下载文献需遵守相关版权规定

论文标题