论文标题
开放域对话中的多句知识选择
Multi-Sentence Knowledge Selection in Open-Domain Dialogue
论文作者
论文摘要
在开放域对话研究中,有效地将外部知识源纳入对话是一个长期存在的问题。有关开放域知识选择的现有文献是有限的,并在知识源上进行了某些脆弱的假设,以简化整体任务(Dinan等,2019),例如每个上下文存在一个相关的知识句子。在这项工作中,我们评估了开放域对话知识选择的现有状态,以显示有关数据和评估的现有方法的缺陷。然后,我们通过提出一个新的框架来收集相关知识,并根据Wikipedia(WOW)语料库创建一个增强数据集来改进它们,并将其称为WOW ++。 WOW ++平均每个对话上下文中的相关知识句子句子,涵盖了开放域对话知识选择的固有歧义。然后,我们在此增强数据集中对各种知识排名进行了基准,并具有内在的评估和响应质量的外在度量,这表明使用WOW ++的神经rerankers可以优于在标准数据集中培训的培训的排名者。
Incorporating external knowledge sources effectively in conversations is a longstanding problem in open-domain dialogue research. The existing literature on open-domain knowledge selection is limited and makes certain brittle assumptions on knowledge sources to simplify the overall task (Dinan et al., 2019), such as the existence of a single relevant knowledge sentence per context. In this work, we evaluate the existing state of open-domain conversation knowledge selection, showing where the existing methodologies regarding data and evaluation are flawed. We then improve on them by proposing a new framework for collecting relevant knowledge, and create an augmented dataset based on the Wizard of Wikipedia (WOW) corpus, which we call WOW++. WOW++ averages 8 relevant knowledge sentences per dialogue context, embracing the inherent ambiguity of open-domain dialogue knowledge selection. We then benchmark various knowledge ranking algorithms on this augmented dataset with both intrinsic evaluation and extrinsic measures of response quality, showing that neural rerankers that use WOW++ can outperform rankers trained on standard datasets.