论文标题
Tensorir:用于自动张开程序优化的抽象
TensorIR: An Abstraction for Automatic Tensorized Program Optimization
论文作者
论文摘要
在各种设备上部署深度学习模型已成为一个重要的话题。硬件专业化的浪潮为多维张量计算带来了一套多样的加速度基原始人。这些新的加速度原始基本以及新兴的机器学习模型带来了巨大的工程挑战。在本文中,我们提出了Tensorir,这是一种编译器抽象,用于使用这些张量计算原始素来优化程序。 Tensorir概括了现有机器学习编译器中使用的循环巢表示,以将张量计算作为一流的公民。最后,我们在抽象之上构建了一个端到端框架,以自动优化给定张量计算原始图的深度学习模型。实验结果表明,Tensorir编译会自动使用给定硬件后端的张量计算原始功能,并提供与跨平台的最新手工精制系统竞争的性能。
Deploying deep learning models on various devices has become an important topic. The wave of hardware specialization brings a diverse set of acceleration primitives for multi-dimensional tensor computations. These new acceleration primitives, along with the emerging machine learning models, bring tremendous engineering challenges. In this paper, we present TensorIR, a compiler abstraction for optimizing programs with these tensor computation primitives. TensorIR generalizes the loop nest representation used in existing machine learning compilers to bring tensor computation as the first-class citizen. Finally, we build an end-to-end framework on top of our abstraction to automatically optimize deep learning models for given tensor computation primitives. Experimental results show that TensorIR compilation automatically uses the tensor computation primitives for given hardware backends and delivers performance that is competitive to state-of-art hand-optimized systems across platforms.