论文标题

AutoQA:从数据库到QA语义解析器,只有合成训练数据

AutoQA: From Databases To QA Semantic Parsers With Only Synthetic Training Data

论文作者

Xu, Silei, Semnani, Sina J., Campagna, Giovanni, Lam, Monica S.

论文摘要

我们提出了Autoqa,这是一种方法和工具包,用于生成语义解析器,可以在数据库上回答问题,而无需手动努力。鉴于数据库模式及其数据,AutoQA会自动为涵盖不同数据库操作的培训生成大量的高质量问题。它使用自动释义与基于模板的解析相结合,以在语音的不同部分找到属性的替代表达式。它还使用一种新颖的过滤自动 - paraphraser来生成整个句子的正确释义。我们将AUTOQA应用于Schema2QA数据集,并在自然问题上进行测试时获得62.9%的平均逻辑形式准确性,这比接受专家自然语言注释训练的模型低6.4%,并从人群中收集的专家自然语言注释和释义数据。为了演示AUTOQA的一般性,我们还将其应用于隔夜数据集。 AutoQA的答案准确性达到69.8%,比最先进的零拍摄模型高16.4%,仅比接受人类数据训练的同一模型低5.2%。

We propose AutoQA, a methodology and toolkit to generate semantic parsers that answer questions on databases, with no manual effort. Given a database schema and its data, AutoQA automatically generates a large set of high-quality questions for training that covers different database operations. It uses automatic paraphrasing combined with template-based parsing to find alternative expressions of an attribute in different parts of speech. It also uses a novel filtered auto-paraphraser to generate correct paraphrases of entire sentences. We apply AutoQA to the Schema2QA dataset and obtain an average logical form accuracy of 62.9% when tested on natural questions, which is only 6.4% lower than a model trained with expert natural language annotations and paraphrase data collected from crowdworkers. To demonstrate the generality of AutoQA, we also apply it to the Overnight dataset. AutoQA achieves 69.8% answer accuracy, 16.4% higher than the state-of-the-art zero-shot models and only 5.2% lower than the same model trained with human data.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源