论文标题

朝向端到端的以任务为导向的对话框系统

Toward Self-learning End-to-End Task-Oriented Dialog Systems

论文作者

Zhang, Xiaoying, Peng, Baolin, Gao, Jianfeng, Meng, Helen

论文摘要

端到端的任务机器人通常是通过静态且通常有限尺寸的语料库学习的。但是,当部署在动态,更改和开放环境中以与用户交互时,任务机器人在面对偏离培训语料库的数据时往往会失败。在本文中,我们研究了通过从最低或零人类注释中学习人机器人相互作用,将任务机器人自动调整为不断变化的环境的问题。我们提出了Sl-Agent,这是一个新型的自学习框架,用于构建端到端的任务机器人。 SL-Agent由对话模型和预训练的奖励模型组成,以预测代理响应的质量。它使任务机器人能够通过通过加固学习通过并入的奖励模型从部署后积累的未标记的人机对话日志中学习来自动适应不断变化的环境。四个经过良好研究的对话任务的实验结果表明,使用自动和人类评估,SL-Agent对自动适应不断变化的环境的有效性。我们将发布代码和数据以进行进一步研究。

End-to-end task bots are typically learned over a static and usually limited-size corpus. However, when deployed in dynamic, changing, and open environments to interact with users, task bots tend to fail when confronted with data that deviate from the training corpus, i.e., out-of-distribution samples. In this paper, we study the problem of automatically adapting task bots to changing environments by learning from human-bot interactions with minimum or zero human annotations. We propose SL-AGENT, a novel self-learning framework for building end-to-end task bots. SL-AGENT consists of a dialog model and a pre-trained reward model to predict the quality of an agent response. It enables task bots to automatically adapt to changing environments by learning from the unlabeled human-bot dialog logs accumulated after deployment via reinforcement learning with the incorporated reward model. Experimental results on four well-studied dialog tasks show the effectiveness of SL-AGENT to automatically adapt to changing environments, using both automatic and human evaluations. We will release code and data for further research.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源