论文标题
具有双重知识增强的生成预审预周化语言模型的多模式对话框系统
Multimodal Dialog Systems with Dual Knowledge-enhanced Generative Pretrained Language Model
论文作者
论文摘要
多模式面向任务的对话框系统的文本响应生成旨在在给定多模式上下文的情况下生成适当的文本响应,这是一项必不可少但具有挑战性的任务。尽管现有的努力取得了令人信服的成功,但他们仍然遭受了两个关键的局限性:1)忽略生成预训练的好处,以及2)忽略与文本上下文相关的知识。为了解决这些局限性,我们为多模式以任务为导向的对话系统(DKMD)提出了一种新颖的双重知识增强的生成预验力的语言模型,该模型由三个关键组成部分组成:双重知识选择,双重知识增强上下文上下文学习和知识增强的响应。具体来说,双重知识选择组件旨在根据给定上下文的文本和视觉方式选择相关的知识。此后,双重知识增强的上下文学习组件是从全球和局部观点上都无缝地将所选知识整合到多模式上下文的学习中,并探索了跨模式的语义关系。此外,知识增强的响应生成部分包括经过修订的Bart解码器,其中引入了其他点产品知识折断者注意子层,以明确利用知识来推进文本响应的生成。公共数据集的广泛实验验证了拟议的DKMD优于最先进的竞争对手。
Text response generation for multimodal task-oriented dialog systems, which aims to generate the proper text response given the multimodal context, is an essential yet challenging task. Although existing efforts have achieved compelling success, they still suffer from two pivotal limitations: 1) overlook the benefit of generative pre-training, and 2) ignore the textual context related knowledge. To address these limitations, we propose a novel dual knowledge-enhanced generative pretrained language model for multimodal task-oriented dialog systems (DKMD), consisting of three key components: dual knowledge selection, dual knowledge-enhanced context learning, and knowledge-enhanced response generation. To be specific, the dual knowledge selection component aims to select the related knowledge according to both textual and visual modalities of the given context. Thereafter, the dual knowledge-enhanced context learning component targets seamlessly integrating the selected knowledge into the multimodal context learning from both global and local perspectives, where the cross-modal semantic relation is also explored. Moreover, the knowledge-enhanced response generation component comprises a revised BART decoder, where an additional dot-product knowledge-decoder attention sub-layer is introduced for explicitly utilizing the knowledge to advance the text response generation. Extensive experiments on a public dataset verify the superiority of the proposed DKMD over state-of-the-art competitors.