论文标题
部分可观测时空混沌系统的无模型预测
Integration of Skyline Queries into Spark SQL
论文作者
论文摘要
天际线查询经常用于数据分析和多标准决策支持应用程序,以从大量数据中过滤相关信息。 Apache Spark是处理大型分布式数据的流行框架。该框架甚至通过Spark SQL模块提供了方便的类似SQL的接口。但是,天际线的查询不是本地支持的,需要繁琐的重写以适合SQL标准或Spark的类似SQL的语言。我们工作的目的是填补这一空白。因此,我们将天际线操作员的全面集成到Spark SQL中。这允许简单易于使用的语法输入天际线查询。此外,我们的经验结果表明,这种集成的天际线查询解决方案在迄今为止胜过基于重写标准SQL的解决方案。
Skyline queries are frequently used in data analytics and multi-criteria decision support applications to filter relevant information from big amounts of data. Apache Spark is a popular framework for processing big, distributed data. The framework even provides a convenient SQL-like interface via the Spark SQL module. However, skyline queries are not natively supported and require tedious rewriting to fit the SQL standard or Spark's SQL-like language. The goal of our work is to fill this gap. We thus provide a full-fledged integration of the skyline operator into Spark SQL. This allows for a simple and easy to use syntax to input skyline queries. Moreover, our empirical results show that this integrated solution of skyline queries by far outperforms a solution based on rewriting into standard SQL.