论文标题

动态分布式数据模型的Python实现

Python Implementation of the Dynamic Distributed Dimensional Data Model

论文作者

Jananthan, Hayden, Milechin, Lauren, Jones, Michael, Arcand, William, Bergeron, William, Bestor, David, Byun, Chansup, Houle, Michael, Hubbell, Matthew, Gadepally, Vijay, Klein, Anna, Michaleas, Peter, Morales, Guillermo, Mullen, Julie, Prout, Andrew, Reuther, Albert, Rosa, Antonio, Samsi, Siddharth, Yee, Charles, Kepner, Jeremy

论文摘要

Python已成为一种标准的科学计算语言,并具有对机器学习和数据分析模块的快速增长的支持,并增加了对大数据的使用。动态分布式尺寸数据模型(D4M)提供了一个高度可组合的,统一的数据模型,具有强大的性能,以快速有效地处理大数据。在这项工作中,我们介绍了Python D4M的实施。 $ d4m.py $实现了D4M的所有基础功能,并通过Grushulo包括Accumulo和SQL数据库支持。我们描述了数学背景和动力,这是对其基本功能和构建基础的方法的解释,以及将$ d4m.py $的性能与D4M-Matlab和D4m.jl进行比较的性能结果。

Python has become a standard scientific computing language with fast-growing support of machine learning and data analysis modules, as well as an increasing usage of big data. The Dynamic Distributed Dimensional Data Model (D4M) offers a highly composable, unified data model with strong performance built to handle big data fast and efficiently. In this work we present an implementation of D4M in Python. $D4M.py$ implements all foundational functionality of D4M and includes Accumulo and SQL database support via Graphulo. We describe the mathematical background and motivation, an explanation of the approaches made for its fundamental functions and building blocks, and performance results which compare $D4M.py$'s performance to D4M-MATLAB and D4M.jl.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源