论文标题

高性能计算中分布结果的预测I/O变异性

Prediction for Distributional Outcomes in High-Performance Computing I/O Variability

论文作者

Xu, Li, Hong, Yili, Morris, Max D., Cameron, Kirk W.

论文摘要

尽管已经扩展了高性能计算(HPC)系统以满足对科学计算的成倍增长需求,但HPC性能可变性仍然是一个重大挑战,并且已成为计算机科学中的关键研究主题。从统计上讲,性能可变性可以以分布为特征。预测性能变异性是HPC性能变异性管理中的关键步骤,并且是不平凡的,因为人们需要根据系统因素来预测分布功能。在本文中,我们提出了一个新框架来预测性能分布。提出的模型是一个修改的高斯过程,可以预测特定HPC系统配置下输入/输出(I/O)吞吐量的分布功能。我们还施加了单调约束,因此预测的函数是非偏置的,这是累积分布函数的属性。此外,所提出的模型可以同时包含定量和定性输入变量。我们通过根据各种预测任务使用iozone变异性数据来评估所提出方法的性能。结果表明,所提出的方法可以生成准确的预测,并且表现优于现有方法。我们还展示了如何使用预测的功能输出来生成性能分布的标量摘要的预测,例如均值,标准偏差和分位数。我们的方法可以进一步用作HPC系统可变性监视和优化的替代模型。

Although high-performance computing (HPC) systems have been scaled to meet the exponentially-growing demand for scientific computing, HPC performance variability remains a major challenge and has become a critical research topic in computer science. Statistically, performance variability can be characterized by a distribution. Predicting performance variability is a critical step in HPC performance variability management and is nontrivial because one needs to predict a distribution function based on system factors. In this paper, we propose a new framework to predict performance distributions. The proposed model is a modified Gaussian process that can predict the distribution function of the input/output (I/O) throughput under a specific HPC system configuration. We also impose a monotonic constraint so that the predicted function is nondecreasing, which is a property of the cumulative distribution function. Additionally, the proposed model can incorporate both quantitative and qualitative input variables. We evaluate the performance of the proposed method by using the IOzone variability data based on various prediction tasks. Results show that the proposed method can generate accurate predictions, and outperform existing methods. We also show how the predicted functional output can be used to generate predictions for a scalar summary of the performance distribution, such as the mean, standard deviation, and quantiles. Our methods can be further used as a surrogate model for HPC system variability monitoring and optimization.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源