AutoQA：从数据库到QA语义解析器，只有合成训练数据

论文标题

AutoQA：从数据库到QA语义解析器，只有合成训练数据

AutoQA: From Databases To QA Semantic Parsers With Only Synthetic Training Data

论文作者

Xu, Silei, Semnani, Sina J., Campagna, Giovanni, Lam, Monica S.

论文摘要

我们提出了Autoqa，这是一种方法和工具包，用于生成语义解析器，可以在数据库上回答问题，而无需手动努力。鉴于数据库模式及其数据，AutoQA会自动为涵盖不同数据库操作的培训生成大量的高质量问题。它使用自动释义与基于模板的解析相结合，以在语音的不同部分找到属性的替代表达式。它还使用一种新颖的过滤自动 - paraphraser来生成整个句子的正确释义。我们将AUTOQA应用于Schema2QA数据集，并在自然问题上进行测试时获得62.9％的平均逻辑形式准确性，这比接受专家自然语言注释训练的模型低6.4％，并从人群中收集的专家自然语言注释和释义数据。为了演示AUTOQA的一般性，我们还将其应用于隔夜数据集。 AutoQA的答案准确性达到69.8％，比最先进的零拍摄模型高16.4％，仅比接受人类数据训练的同一模型低5.2％。

We propose AutoQA, a methodology and toolkit to generate semantic parsers that answer questions on databases, with no manual effort. Given a database schema and its data, AutoQA automatically generates a large set of high-quality questions for training that covers different database operations. It uses automatic paraphrasing combined with template-based parsing to find alternative expressions of an attribute in different parts of speech. It also uses a novel filtered auto-paraphraser to generate correct paraphrases of entire sentences. We apply AutoQA to the Schema2QA dataset and obtain an average logical form accuracy of 62.9% when tested on natural questions, which is only 6.4% lower than a model trained with expert natural language annotations and paraphrase data collected from crowdworkers. To demonstrate the generality of AutoQA, we also apply it to the Overnight dataset. AutoQA achieves 69.8% answer accuracy, 16.4% higher than the state-of-the-art zero-shot models and only 5.2% lower than the same model trained with human data.

下载PDF全文

下载文献需遵守相关版权规定

论文标题