论文标题
自动发现复合SPMD分区策略
Automatic Discovery of Composite SPMD Partitioning Strategies in PartIR
论文作者
论文摘要
大型神经网络模型通常是通过单个程序中的高级并行策略组合多个数据(SPMD)范式来训练的。例如,训练大型变压器模型需要组合数据,模型和管道分区;和优化器碎片技术。但是,确定许多模型架构和加速器系统的有效组合需要大量的手动分析。在这项工作中,我们提出了一个自动分区仪,该分区仪通过目标搜索来识别这些组合。我们的主要发现是,基于蒙特卡洛树搜索的分区者将特定于分区的编译器分析直接用于搜索和指导目标与各种模型的专家级策略相匹配。
Large neural network models are commonly trained through a combination of advanced parallelism strategies in a single program, multiple data (SPMD) paradigm. For example, training large transformer models requires combining data, model, and pipeline partitioning; and optimizer sharding techniques. However, identifying efficient combinations for many model architectures and accelerator systems requires significant manual analysis. In this work, we present an automatic partitioner that identifies these combinations through a goal-oriented search. Our key findings are that a Monte Carlo Tree Search-based partitioner leveraging partition-specific compiler analysis directly into the search and guided goals matches expert-level strategies for various models.