地址：了解自动基准制度的深度学习推论

论文标题

地址：了解自动基准制度的深度学习推论

InferBench: Understanding Deep Learning Inference Serving with an Automatic Benchmarking System

论文作者

Zhang, Huaizheng, Huang, Yizheng, Wen, Yonggang, Yin, Jianxiong, Guan, Kyle

论文摘要

深度学习（DL）模型已成为许多应用程序的核心模块。但是，在没有仔细的性能基准的情况下部署这些模型，认为硬件和软件的影响通常会导致服务不足和昂贵的运营支出。为了促进DL模型的部署，我们为DL开发人员实施了自动和全面的基准系统。为了完成与基准相关的任务，开发人员只需要准备一个由几行代码组成的配置文件。我们的系统部署到DL群集中的领导者服务器，将向追随者工人分配给用户的基准作业。接下来，系统可以自动生成相应的请求，工作负载，甚至模型，以进行DL提供基准。最后，开发人员可以利用我们系统中的许多分析工具和模型来洞悉不同系统配置的权衡。此外，还合并了两层调度程序，以避免不必要的干扰，并提高平均工作汇编时间高达1.43倍（相当于30 \％降低）。我们的系统设计遵循DL集群操作中的最佳实践，以加快开发人员的日常DL服务评估工作。我们进行许多基准实验，以提供深入和全面的评估。我们认为，作为DL服务配置和资源分配的指南，这些结果具有很高的价值。

Deep learning (DL) models have become core modules for many applications. However, deploying these models without careful performance benchmarking that considers both hardware and software's impact often leads to poor service and costly operational expenditure. To facilitate DL models' deployment, we implement an automatic and comprehensive benchmark system for DL developers. To accomplish benchmark-related tasks, the developers only need to prepare a configuration file consisting of a few lines of code. Our system, deployed to a leader server in DL clusters, will dispatch users' benchmark jobs to follower workers. Next, the corresponding requests, workload, and even models can be generated automatically by the system to conduct DL serving benchmarks. Finally, developers can leverage many analysis tools and models in our system to gain insights into the trade-offs of different system configurations. In addition, a two-tier scheduler is incorporated to avoid unnecessary interference and improve average job compilation time by up to 1.43x (equivalent of 30\% reduction). Our system design follows the best practice in DL clusters operations to expedite day-to-day DL service evaluation efforts by the developers. We conduct many benchmark experiments to provide in-depth and comprehensive evaluations. We believe these results are of great values as guidelines for DL service configuration and resource allocation.

下载PDF全文

下载文献需遵守相关版权规定

论文标题