MK平方：使用迭代模板填充综合问题

论文标题

MK平方：使用迭代模板填充综合问题

MK-SQuIT: Synthesizing Questions using Iterative Template-filling

论文作者

Spiegel, Benjamin A., Cheong, Vincent, Kaplan, James E., Sanchez, Anthony

论文摘要

这项工作的目的是创建一个框架，用于与人类输入尽可能少的合成生成问题/查询对。这些数据集可用于训练机器翻译系统，以将自然语言问题转换为查询，这是一种有用的工具，可以使更多自然访问数据库信息。现有的数据集生成方法需要人类输入，该输入与数据集的大小线性扩展，从而导致小数据集。除了简短的初始配置任务外，在系统的查询生成过程中，不需要人类输入。我们利用Wikidata（RDF Triples的知识库），作为生成问题和查询的主要内容的来源。使用多层问题模板，我们能够避开人类以前方法处理的查询产生的一些最具挑战性的部分；人类在此过程的任何步骤中都不必修改，汇总，检查，注释或生成任何问题或查询。我们的系统很容易配置为多个域，并且可以修改以用英语以外的自然语言生成查询。我们还提供了一个示例数据集，该数据集在四个Wikidata域中，有110,000个问题/查询对。然后，我们提出了一个基线模型，我们使用数据集训练该模型，该模型在商业质量检查设置中显示出希望。

The aim of this work is to create a framework for synthetically generating question/query pairs with as little human input as possible. These datasets can be used to train machine translation systems to convert natural language questions into queries, a useful tool that could allow for more natural access to database information. Existing methods of dataset generation require human input that scales linearly with the size of the dataset, resulting in small datasets. Aside from a short initial configuration task, no human input is required during the query generation process of our system. We leverage WikiData, a knowledge base of RDF triples, as a source for generating the main content of questions and queries. Using multiple layers of question templating we are able to sidestep some of the most challenging parts of query generation that have been handled by humans in previous methods; humans never have to modify, aggregate, inspect, annotate, or generate any questions or queries at any step in the process. Our system is easily configurable to multiple domains and can be modified to generate queries in natural languages other than English. We also present an example dataset of 110,000 question/query pairs across four WikiData domains. We then present a baseline model that we train using the dataset which shows promise in a commercial QA setting.

下载PDF全文

下载文献需遵守相关版权规定

论文标题