论文标题
面向任务的对话系统的逻辑推理
Logical Reasoning for Task Oriented Dialogue Systems
论文作者
论文摘要
近年来,对话系统已使用大型审计模型来提高成功的任务完成率。但是,对话平台的缺乏推理能力使得很难提供相关和流利的响应,除非会话体验的设计师花费大量时间在基于外部规则的模块中实施这些功能。在这项工作中,我们提出了一种新颖的方法来微调预算的变压器模型,例如Roberta和T5。在给定的对话环境中推理一组事实。我们的方法包括一种综合数据生成机制,该机制有助于模型学习逻辑关系,例如数值值列表,逆关系(和否定),分类属性的包含和排除,以及属性在数值和分类值中的组合以及对数值值的属性组合的应用,而无需其他培训数据集。我们表明,基于变压器的模型可以执行逻辑推理,以回答对话上下文包含所有必需信息时,否则,当有部分信息可用时,它可以提取适当的约束以传递到下游组件(例如知识库)。我们观察到,基于变压器的模型(例如UnifyQA-T5)可以对逻辑推理进行微调(例如数值和分类属性的比较),而不是在训练时间中看到的属性(例如,90 \%+的准确性为90 \%+,用于比较较小的$ k _ {\ max} $ = 5 $ = 5 $ = 5 $ = 5 $ = 5值超过持有的测试数据量)。
In recent years, large pretrained models have been used in dialogue systems to improve successful task completion rates. However, lack of reasoning capabilities of dialogue platforms make it difficult to provide relevant and fluent responses, unless the designers of a conversational experience spend a considerable amount of time implementing these capabilities in external rule based modules. In this work, we propose a novel method to fine-tune pretrained transformer models such as Roberta and T5. to reason over a set of facts in a given dialogue context. Our method includes a synthetic data generation mechanism which helps the model learn logical relations, such as comparison between list of numerical values, inverse relations (and negation), inclusion and exclusion for categorical attributes, and application of a combination of attributes over both numerical and categorical values, and spoken form for numerical values, without need for additional training dataset. We show that the transformer based model can perform logical reasoning to answer questions when the dialogue context contains all the required information, otherwise it is able to extract appropriate constraints to pass to downstream components (e.g. a knowledge base) when partial information is available. We observe that transformer based models such as UnifiedQA-T5 can be fine-tuned to perform logical reasoning (such as numerical and categorical attributes' comparison) over attributes that been seen in training time (e.g., accuracy of 90\%+ for comparison of smaller than $k_{\max}$=5 values over heldout test dataset).