论文标题
分布式流处理的零拍摄成本模型
Zero-Shot Cost Models for Distributed Stream Processing
论文作者
论文摘要
本文提出了针对分布式流处理系统(DSP)的学习成本估算模型,目的是提供执行查询的准确成本预测。这项工作的主要前提是,拟议中的学习模型可以推广到流媒体工作负载的动态。这意味着曾经训练的模型可以准确预测性能指标,例如延迟和吞吐量,即使数据和工作负载的特征或运营商在运行时部署到硬件变化的特征。这样,该模型可用于求解任务,例如优化操作员的放置以最大程度地减少流媒体查询的端到端延迟,或者即使在不同条件下,也可以最大化其吞吐量。我们对众所周知的DSP Apache Storm的评估表明,该模型可以准确地预测看不见的工作负载和查询,同时跨实际基准进行概括。
This paper proposes a learned cost estimation model for Distributed Stream Processing Systems (DSPS) with an aim to provide accurate cost predictions of executing queries. A major premise of this work is that the proposed learned model can generalize to the dynamics of streaming workloads out-of-the-box. This means a model once trained can accurately predict performance metrics such as latency and throughput even if the characteristics of the data and workload or the deployment of operators to hardware changes at runtime. That way, the model can be used to solve tasks such as optimizing the placement of operators to minimize the end-to-end latency of a streaming query or maximize its throughput even under varying conditions. Our evaluation on a well-known DSPS, Apache Storm, shows that the model can predict accurately for unseen workloads and queries while generalizing across real-world benchmarks.