论文标题

部分可观测时空混沌系统的无模型预测

Integration of Skyline Queries into Spark SQL

论文作者

Grasmann, Lukas, Pichler, Reinhard, Selzer, Alexander

论文摘要

天际线查询经常用于数据分析和多标准决策支持应用程序,以从大量数据中过滤相关信息。 Apache Spark是处理大型分布式数据的流行框架。该框架甚至通过Spark SQL模块提供了方便的类似SQL的接口。但是,天际线的查询不是本地支持的,需要繁琐的重写以适合SQL标准或Spark的类似SQL的语言。我们工作的目的是填补这一空白。因此,我们将天际线操作员的全面集成到Spark SQL中。这允许简单易于使用的语法输入天际线查询。此外,我们的经验结果表明,这种集成的天际线查询解决方案在迄今为止胜过基于重写标准SQL的解决方案。

Skyline queries are frequently used in data analytics and multi-criteria decision support applications to filter relevant information from big amounts of data. Apache Spark is a popular framework for processing big, distributed data. The framework even provides a convenient SQL-like interface via the Spark SQL module. However, skyline queries are not natively supported and require tedious rewriting to fit the SQL standard or Spark's SQL-like language. The goal of our work is to fill this gap. We thus provide a full-fledged integration of the skyline operator into Spark SQL. This allows for a simple and easy to use syntax to input skyline queries. Moreover, our empirical results show that this integrated solution of skyline queries by far outperforms a solution based on rewriting into standard SQL.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源