奇异性：AI工作负载的行星规模，抢先和弹性调度

论文标题

奇异性：AI工作负载的行星规模，抢先和弹性调度

Singularity: Planet-Scale, Preemptive and Elastic Scheduling of AI Workloads

论文作者

Shukla, Dharma, Sivathanu, Muthian, Viswanatha, Srinidhi, Gulavani, Bhargav, Nehme, Rimma, Agrawal, Amey, Chen, Chen, Kwatra, Nipun, Ramjee, Ramachandran, Sharma, Pankaj, Katiyar, Atul, Modi, Vipul, Sharma, Vaibhav, Singh, Abhishek, Singhal, Shreshth, Welankar, Kaustubh, Xun, Lu, Anupindi, Ravi, Elangovan, Karthik, Rahman, Hasibur, Lin, Zhou, Seetharaman, Rahul, Xu, Cheng, Ailijiang, Eddie, Krishnappa, Suresh, Russinovich, Mark

论文摘要

通过在深度学习工作负载中推动高利用来降低成本是云提供商的关键杠杆。我们介绍了微软全球分布式调度服务的Singularity，以高效且可靠地执行深度学习培训和推理工作负载。奇异性的核心是一种新颖的，工作量感知的调度程序，可以透明地抢先，并在全球AI加速器（例如GPUS，FPGAS）的全球AI加速器机队（例如，在全球AI加速器中，都可以在不影响其正确性或表现的情况下延伸深度学习负载，而不会影响其正确性或性能。 All jobs in Singularity are preemptable, migratable, and dynamically resizable (elastic) by default: a live job can be dynamically and transparently (a) preempted and migrated to a different set of nodes, cluster, data center or a region and resumed exactly from the point where the execution was preempted, and (b) resized (i.e., elastically scaled-up/down) on a varying set of accelerators of a given 类型。我们的机制是透明的，因为它们不需要用户对其代码进行任何更改，也不需要使用任何可能限制灵活性的自定义库。此外，我们的方法大大提高了深度学习工作负载的可靠性。我们表明，由于对稳态性能的影响忽略不计，实现了奇异性的效率和可靠性增长。最后，我们的设计方法是DNN体系结构的不可知论，并处理各种并行性策略（例如，数据/管道/模型并行性）。

Lowering costs by driving high utilization across deep learning workloads is a crucial lever for cloud providers. We present Singularity, Microsoft's globally distributed scheduling service for highly-efficient and reliable execution of deep learning training and inference workloads. At the heart of Singularity is a novel, workload-aware scheduler that can transparently preempt and elastically scale deep learning workloads to drive high utilization without impacting their correctness or performance, across a global fleet of AI accelerators (e.g., GPUs, FPGAs). All jobs in Singularity are preemptable, migratable, and dynamically resizable (elastic) by default: a live job can be dynamically and transparently (a) preempted and migrated to a different set of nodes, cluster, data center or a region and resumed exactly from the point where the execution was preempted, and (b) resized (i.e., elastically scaled-up/down) on a varying set of accelerators of a given type. Our mechanisms are transparent in that they do not require the user to make any changes to their code or require using any custom libraries that may limit flexibility. Additionally, our approach significantly improves the reliability of deep learning workloads. We show that the resulting efficiency and reliability gains with Singularity are achieved with negligible impact on the steady-state performance. Finally, our design approach is agnostic of DNN architectures and handles a variety of parallelism strategies (e.g., data/pipeline/model parallelism).

下载PDF全文

下载文献需遵守相关版权规定

论文标题