AIPERF：自动化机器学习作为AI-HPC基准

论文标题

AIPERF：自动化机器学习作为AI-HPC基准

AIPerf: Automated machine learning as an AI-HPC benchmark

论文作者

Ren, Zhixiang, Liu, Yongheng, Shi, Tianhui, Xie, Lei, Zhou, Yue, Zhai, Jidong, Zhang, Youhui, Zhang, Yunquan, Chen, Wenguang

论文摘要

大量的复杂人工智能（AI）算法和可用的高性能计算（HPC）功率刺激了具有异质设计的AI组件的迅速开发。因此，对AI-HPC系统的跨堆栈性能基准测试的需求迅速出现。事实上的HPC基准LinPack无法反映AI计算能力和I/O性能，而无需代表性工作负载。当前流行的AI基准（例如MLPERF）具有固定的问题大小，因此可扩展性有限。为了解决这些问题，我们提出了一个使用自动化机器学习（AUTOML）的端到端基准套件，该套件不仅代表了真实的AI场景，而且还可以自动适时可扩展到各种机器尺度。我们以高度平行和灵活的方式实现算法，以确保具有可自定义配置的不同系统的效率和优化潜力。我们使用以分析和系统的方法测量的每秒操作（OPS）作为量化AI性能的主要指标。我们对各种系统进行评估，以确保基准测试的稳定性和可伸缩性，从4个节点，具有32个NVIDIA TESLA T4（测量56.1 TERA-OPS），最高为512个节点，具有4096 Huawei Ascend 910（194.53 PETA-OPS）的4096节点（测量了194.53 PETA-ops），并显示了近乎细致的弱量表。通过灵活的工作量和单个度量，我们的基准可以轻松扩展和对AI-HPC进行排名。

The plethora of complex artificial intelligence (AI) algorithms and available high performance computing (HPC) power stimulates the expeditious development of AI components with heterogeneous designs. Consequently, the need for cross-stack performance benchmarking of AI-HPC systems emerges rapidly. The de facto HPC benchmark LINPACK can not reflect AI computing power and I/O performance without representative workload. The current popular AI benchmarks like MLPerf have fixed problem size therefore limited scalability. To address these issues, we propose an end-to-end benchmark suite utilizing automated machine learning (AutoML), which not only represents real AI scenarios, but also is auto-adaptively scalable to various scales of machines. We implement the algorithms in a highly parallel and flexible way to ensure the efficiency and optimization potential on diverse systems with customizable configurations. We utilize operations per second (OPS), which is measured in an analytical and systematic approach, as the major metric to quantify the AI performance. We perform evaluations on various systems to ensure the benchmark's stability and scalability, from 4 nodes with 32 NVIDIA Tesla T4 (56.1 Tera-OPS measured), up to 512 nodes with 4096 Huawei Ascend 910 (194.53 Peta-OPS measured), and the results show near-linear weak scalability. With flexible workload and single metric, our benchmark can scale and rank AI-HPC easily.

下载PDF全文

下载文献需遵守相关版权规定

论文标题