反思，而不是反射：基于推理的共同基础改善对话响应质量

论文标题

反思，而不是反射：基于推理的共同基础改善对话响应质量

Reflect, Not Reflex: Inference-Based Common Ground Improves Dialogue Response Quality

论文作者

Zhou, Pei, Cho, Hyundong, Jandaghi, Pegah, Lee, Dong-Ho, Lin, Bill Yuchen, Pujara, Jay, Ren, Xiang

论文摘要

人类交流依赖于共同基础（CG），参与者共享的相互知识和信念来进行连贯而有趣的对话。在本文中，我们证明了当前的响应产生（RG）模型在对话中产生通用和沉闷的响应，因为它们反射性能，无法明确型号CG，这是由于训练数据中缺乏CG和标准的RG培训程序。我们介绍反思，一个数据集，该数据集用明确的CG（被认为是近似共同的知识和信念的推论），并征求9K多样化的人类生成的响应，每个响应每个共同点。使用反射，我们展示了当前对话数据和RG模型的局限性：当前数据中响应的一半不到一半被评为高质量（明智，特定和有趣），并且使用此数据培训的模型甚至更低，而大多数反映响应是高质量的。接下来，我们分析CG是否可以通过使用反映CG指导RG模型来帮助模型产生更好的响应。令人惊讶的是，我们发现仅促使GPT3“思考” CG会产生30％的质量响应，从而显示出将CG集成到RG过程中的有希望的好处。

Human communication relies on common ground (CG), the mutual knowledge and beliefs shared by participants, to produce coherent and interesting conversations. In this paper, we demonstrate that current response generation (RG) models produce generic and dull responses in dialogues because they act reflexively, failing to explicitly model CG, both due to the lack of CG in training data and the standard RG training procedure. We introduce Reflect, a dataset that annotates dialogues with explicit CG (materialized as inferences approximating shared knowledge and beliefs) and solicits 9k diverse human-generated responses each following one common ground. Using Reflect, we showcase the limitations of current dialogue data and RG models: less than half of the responses in current data are rated as high quality (sensible, specific, and interesting) and models trained using this data have even lower quality, while most Reflect responses are judged high quality. Next, we analyze whether CG can help models produce better-quality responses by using Reflect CG to guide RG models. Surprisingly, we find that simply prompting GPT3 to "think" about CG generates 30% more quality responses, showing promising benefits to integrating CG into the RG process.

下载PDF全文

下载文献需遵守相关版权规定

论文标题