论文标题
PATSQL:从示例表中有效合成SQL查询,并快速推断投影列
PATSQL: Efficient Synthesis of SQL Queries from Example Tables with Quick Inference of Projected Columns
论文作者
论文摘要
SQL是最受欢迎的数据分析工具之一,现在越来越多的用户使用它而没有在数据库中具有专业知识。几项研究提出了典型编程方法,以帮助此类非专家编写正确的SQL查询。尽管现有方法支持各种SQL功能,例如聚合和嵌套查询,但随着示例表的规模的增加,它们的计算成本显着增加。在本文中,我们提出了一种有效的算法,利用关系代数已知的属性从输入和输出表中合成SQL查询。我们的关键见解是,可以通过在关系代数中应用转换规则,同时保留程序的语义,从而将程序草图中的投影操作员抬高到其他操作员之上。这可以快速推断投影操作员中适当的列,这是综合中的重要组成部分,但会导致与先前工作的组合爆炸。我们还引入了一种新颖的约束形式及其自上而下的传播机制,以进行有效的草图完成。我们在工具PATSQL中实现了该算法,并对先前基准和Kaggle的教程的226个查询进行了评估。结果,Patsql解决了68%的基准,并在一秒钟内发现了89%的解决方案。我们的工具可从https://naist-se.github.io/patsql/获得。
SQL is one of the most popular tools for data analysis, and it is now used by an increasing number of users without having expertise in databases. Several studies have proposed programming-by-example approaches to help such non-experts to write correct SQL queries. While existing methods support a variety of SQL features such as aggregation and nested query, they suffer a significant increase in computational cost as the scale of example tables increases. In this paper, we propose an efficient algorithm utilizing properties known in relational algebra to synthesize SQL queries from input and output tables. Our key insight is that a projection operator in a program sketch can be lifted above other operators by applying transformation rules in relational algebra, while preserving the semantics of the program. This enables a quick inference of appropriate columns in the projection operator, which is an essential component in synthesis but causes combinatorial explosions in prior work. We also introduce a novel form of constraints and its top-down propagation mechanism for efficient sketch completion. We implemented this algorithm in our tool PATSQL and evaluated it on 226 queries from prior benchmarks and Kaggle's tutorials. As a result, PATSQL solved 68% of the benchmarks and found 89% of the solutions within a second. Our tool is available at https://naist-se.github.io/patsql/.