论文标题
可扩展的数据库模拟器,用于快速原型内部算法算法
Extensible Database Simulator for Fast Prototyping In-Database Algorithms
论文作者
论文摘要
随着数据量表的迅速增加,数据库分析和学习已成为数据科学界研究最多的主题之一,因为它在减少管理和数据分析之间的差距方面具有重要意义。通过扩展数据库在分析和学习上的能力,数据科学家可以节省大量时间在数据库和外部分析工具之间交换数据。为了实现这一目标,研究人员正试图将更多的数据科学算法整合到数据库中。但是,在主流数据库中实现算法是超级耗时的,尤其是当有必要深入研究数据库内核时。因此,需要易于扩展的数据库模拟器来帮助快速原型和验证数据库算法,然后再在实际数据库中实现它们。在此演示中,我们介绍了如此可扩展的关系数据库模拟器DBSIM,以帮助数据科学家原型原型的数据库分析和学习算法,并以最低的成本来验证其思想的有效性。 DBSIM通过整合主流RDBM中的所有主要组件,包括SQL Parser,关系运算符,查询优化器等来模拟真实的关系数据库。DBSIM为用户提供了各种Interfaces,使用户还可以灵活地将其自定义扩展模块插入任何主要组件中,而无需修改Kernelel。通过这些接口,DBSIM支持SQL语法,关系运算符,查询优化器规则和成本模型以及实体计划执行的简单扩展。此外,DBSIM提供了实用程序来促进用户开发和调试的功能,例如查询计划可视化器和交互式分析仪在优化规则上。我们使用Pure Python开发DBSIM来支持大多数数据科学算法的无缝实现,因为其中许多是用Python编写的。
With the rapid increasing of data scale, in-database analytics and learning has become one of the most studied topics in data science community, because of its significance on reducing the gap between the management and the analytics of data. By extending the capability of database on analytics and learning, data scientists can save much time on exchanging data between databases and external analytic tools. For this goal, researchers are attempting to integrate more data science algorithms into database. However, implementing the algorithms in mainstream databases is super time-consuming, especially when it is necessary to have a deep dive into the database kernels. Thus there are demands for an easy-to-extend database simulator to help fast prototype and verify the in-database algorithms before implementing them in real databases. In this demo, we present such an extensible relational database simulator, DBSim, to help data scientists prototype their in-database analytics and learning algorithms and verify the effectiveness of their ideas with minimal cost. DBSim simulates a real relational database by integrating all the major components in mainstream RDBMS, including SQL parser, relational operators, query optimizer, etc. In addition, DBSim provides various interfaces for users to flexibly plug their custom extension modules into any of the major components, without modifying the kernel. By those interfaces, DBSim supports easy extensions on SQL syntax, relational operators, query optimizer rules and cost models, and physical plan execution. Furthermore, DBSim provides utilities to facilitate users' developing and debugging, like query plan visualizer and interactive analyzer on optimization rules. We develop DBSim using pure Python to support seamless implementation of most data science algorithms into it, since many of them are written in Python.