通过半监督学习改善对话分解检测

论文标题

通过半监督学习改善对话分解检测

Improving Dialogue Breakdown Detection with Semi-Supervised Learning

论文作者

Ng, Nathan, Ghassemi, Marzyeh, Thangarajan, Narendran, Pan, Jiacheng, Guo, Qi

论文摘要

建立对话代理的用户信任需要平稳，一致的对话交流。但是，代理商很容易失去对话环境并产生无关紧要的话语。这些情况称为对话分解，在这种情况下，代理人的话语阻止用户继续对话。构建系统以检测对话分解，使代理可以适当恢复或避免完全崩溃。在本文中，我们研究了使用半监督学习方法来改善对话分解检测的使用，包括在REDDIT数据集上继续进行预培训以及一种基于多种数据的数据增强方法。我们证明了这些方法对对话分解检测挑战（DBDC）英语共享任务的有效性。我们提交给2020 DBDC5共享任务位置的提交，首先将基准和其他提交的提交量超过12 \％。在2019年DBDC4数据的消融中，我们的半监督学习方法通过2 \％的准确性提高了基线BERT模型的性能。这些方法通常适用于任何对话任务，并提供了改善模型性能的简单方法。

Building user trust in dialogue agents requires smooth and consistent dialogue exchanges. However, agents can easily lose conversational context and generate irrelevant utterances. These situations are called dialogue breakdown, where agent utterances prevent users from continuing the conversation. Building systems to detect dialogue breakdown allows agents to recover appropriately or avoid breakdown entirely. In this paper we investigate the use of semi-supervised learning methods to improve dialogue breakdown detection, including continued pre-training on the Reddit dataset and a manifold-based data augmentation method. We demonstrate the effectiveness of these methods on the Dialogue Breakdown Detection Challenge (DBDC) English shared task. Our submissions to the 2020 DBDC5 shared task place first, beating baselines and other submissions by over 12\% accuracy. In ablations on DBDC4 data from 2019, our semi-supervised learning methods improve the performance of a baseline BERT model by 2\% accuracy. These methods are applicable generally to any dialogue task and provide a simple way to improve model performance.

下载PDF全文

下载文献需遵守相关版权规定

论文标题