论文标题
反事实数据扩展通过透视过渡进行开放域对话
Counterfactual Data Augmentation via Perspective Transition for Open-Domain Dialogues
论文作者
论文摘要
开放域对话系统的构建需要高质量的对话数据集。对话数据对给定的对话历史记录有多种反应,尤其是具有不同语义的响应。但是,在大多数情况下,收集高质量的数据集是劳动密集型且耗时的。在本文中,我们提出了一种数据增强方法,以通过反事实推断以不同的语义来自动增强高质量响应。具体而言,鉴于观察到的对话,我们的反事实生成模型首先通过用替换替代观察到的回答观点来替换替换响应,从而在语义上不同。此外,我们的数据选择方法过滤了有害的增强响应。实验结果表明,我们的数据增强方法可以在给定的对话历史上以不同的语义来增强高质量的响应,并且可以在多个下游任务上胜过竞争性基线。
The construction of open-domain dialogue systems requires high-quality dialogue datasets. The dialogue data admits a wide variety of responses for a given dialogue history, especially responses with different semantics. However, collecting high-quality such a dataset in most scenarios is labor-intensive and time-consuming. In this paper, we propose a data augmentation method to automatically augment high-quality responses with different semantics by counterfactual inference. Specifically, given an observed dialogue, our counterfactual generation model first infers semantically different responses by replacing the observed reply perspective with substituted ones. Furthermore, our data selection method filters out detrimental augmented responses. Experimental results show that our data augmentation method can augment high-quality responses with different semantics for a given dialogue history, and can outperform competitive baselines on multiple downstream tasks.