论文标题
空气:基于异步迭代路由的轻巧但高性能的数据流引擎
AIR: A Light-Weight Yet High-Performance Dataflow Engine based on Asynchronous Iterative Routing
论文作者
论文摘要
分布式流处理系统(DSPS)是当前数据管理中最新兴的主题之一,其应用程序从实时事件监视到处理复杂的数据流程序和大数据分析。该领域中的主要市场参与者显然是Apache Spark和Flink的代表,它们为SQL,统计推断,机器学习,流处理等提供了各种前端API。然而,很少有关于将这些引擎集成到潜在的高性能计算(HPC)基础架构及其使用的通信协议中的细节。例如,在Java中实现了Spark和Flink,并且仍然依靠专用的主节点来管理计算集群中的工作节点之间的控制流程。 在本文中,我们描述了我们的空气发动机的体系结构,该架构是使用消息传递接口(MPI),用于多线程的PTHREADS在C ++中设计的,并直接部署在常见的HPC工作负载管理器的顶部,例如Slurm。空气实施了轻巧的动态分片协议(称为“异步迭代路由”),该协议促进了所有客户节点之间的直接和异步通信,从而完全避免了由主节点以否则会形成性能瓶颈的主节点引起的控制流。我们对各种基准设置的实验证实,在潜伏期和吞吐量方面,空气的表现高达15倍。此外,我们证明,空气比现有的DSPS到由多达8个节点和224个内核组成的群集的缩放量要好得多。
Distributed Stream Processing Systems (DSPSs) are among the currently most emerging topics in data management, with applications ranging from real-time event monitoring to processing complex dataflow programs and big data analytics. The major market players in this domain are clearly represented by Apache Spark and Flink, which provide a variety of frontend APIs for SQL, statistical inference, machine learning, stream processing, and many others. Yet rather few details are reported on the integration of these engines into the underlying High-Performance Computing (HPC) infrastructure and the communication protocols they use. Spark and Flink, for example, are implemented in Java and still rely on a dedicated master node for managing their control flow among the worker nodes in a compute cluster. In this paper, we describe the architecture of our AIR engine, which is designed from scratch in C++ using the Message Passing Interface (MPI), pthreads for multithreading, and is directly deployed on top of a common HPC workload manager such as SLURM. AIR implements a light-weight, dynamic sharding protocol (referred to as "Asynchronous Iterative Routing"), which facilitates a direct and asynchronous communication among all client nodes and thereby completely avoids the overhead induced by the control flow with a master node that may otherwise form a performance bottleneck. Our experiments over a variety of benchmark settings confirm that AIR outperforms Spark and Flink in terms of latency and throughput by a factor of up to 15; moreover, we demonstrate that AIR scales out much better than existing DSPSs to clusters consisting of up to 8 nodes and 224 cores.