通过层次表示为以任务为导向的语义解析生成合成数据

论文标题

通过层次表示为以任务为导向的语义解析生成合成数据

Generating Synthetic Data for Task-Oriented Semantic Parsing with Hierarchical Representations

论文作者

Tran, Ke, Tan, Ming

论文摘要

现代对话AI系统支持自然语言了解各种能力。尽管可以使用简单而平坦的意见和插槽表示这些任务，但更复杂的功能需要由语义解析支持的复杂层次表示。最先进的语义解析器是使用有监督的学习培训的，并根据层次结构模式标记的数据进行了培训，该数据可能是为新域而获得的代价高昂，也可能不易获得。在这项工作中，我们探讨了使用预验证的denoising序列到序列模型（即BART）生成神经语义解析的合成数据的可能性。具体而言，我们首先从现有标记的话语中提取掩盖模板，然后微调巴特以在提取的模板上生成合成的话语。最后，我们使用辅助解析器（AP）过滤产生的话语。 AP保证生成数据的质量。我们在在Facebook顶部数据集上评估导航域时显示了我们的方法的潜力。

Modern conversational AI systems support natural language understanding for a wide variety of capabilities. While a majority of these tasks can be accomplished using a simple and flat representation of intents and slots, more sophisticated capabilities require complex hierarchical representations supported by semantic parsing. State-of-the-art semantic parsers are trained using supervised learning with data labeled according to a hierarchical schema which might be costly to obtain or not readily available for a new domain. In this work, we explore the possibility of generating synthetic data for neural semantic parsing using a pretrained denoising sequence-to-sequence model (i.e., BART). Specifically, we first extract masked templates from the existing labeled utterances, and then fine-tune BART to generate synthetic utterances conditioning on the extracted templates. Finally, we use an auxiliary parser (AP) to filter the generated utterances. The AP guarantees the quality of the generated data. We show the potential of our approach when evaluating on the Facebook TOP dataset for navigation domain.

下载PDF全文

下载文献需遵守相关版权规定

论文标题