论文标题

OpenMP多线程系统中性能异常的机器学习框架

Machine Learning Framwork for Performance Anomaly in OpenMP Multi-Threaded Systems

论文作者

Wang, Weidong, Luo, Wangda

论文摘要

一些OpenMP多线程应用程序越来越多地遭受性能异常为由,共享资源争夺以及与软件和硬件相关的问题。这种性能异常会导致失败和效率低下,并且是系统弹性的主要挑战。为了最大程度地减少性能异常的影响,必须快速准确地检测和诊断导致失败的性能异常。但是,很难识别OpenMP多线程监控基础架构收集的动态和嘈杂数据中的异常。本文在OpenMP多线程系统中提出了一个新型的机器学习框架,用于性能异常。为了评估我们的框架,使用NAS并行NPB基准,EPCC OpenMP Micro-Benchmark Suite和Jacobi基准测试我们提出的框架的性能。实验结果表明,我们的框架成功识别了90.3%的OpenMP多线程应用程序的异常。

Some OpenMP multi-threaded applications increasingly suffer from performance anomaly owning to shared resource contention as well as software- and hardware-related problems. Such performance anomaly can result in failure and inefficiencies, and are among the main challenges in system resiliency. To minimize the impact of performance anomaly, one must quickly and accurately detect and diagnose the performance anomalies that cause the failures. However, it is difficult to identify anomalies in the dynamic and noisy data collected by OpenMP multi-threaded monitoring infrastructures. This paper presents a novel machine learning framework for performance anomaly in OpenMP multi-threaded systems. To evaluate our framework, the NAS Parallel NPB benchmark, EPCC OpenMP micro-benchmark suite, and Jacobi benchmark are used to test the performance of our framework proposed. The experimental results demonstrate that our framework successfully identifies 90.3\% of injected anomalies of OpenMP multi-threaded applications.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源