论文标题

sparselnr:使用循环巢重组加速稀疏张量计算

SparseLNR: Accelerating Sparse Tensor Computations Using Loop Nest Restructuring

论文作者

Dias, Adhitha, Sundararajah, Kirshanthan, Saumya, Charitha, Kulkarni, Milind

论文摘要

在机器学习,科学模拟和数据挖掘等许多现实世界中,稀疏张量代数计算变得很重要。因此,张量代数内核的自动代码生成和性能优化至关重要。诸如张量代数编译器(TACO)之类的最新进展极大地概括并自动化张量代数表达式的代码生成。但是,由于没有计划指令来支持分布/融合等转换,TACO生成的许多重要张量计算生成的代码仍然不错。 本文扩展了炸玉米饼的调度空间,以支持内核分布/环融合,以降低渐近时间复杂性并改善复杂张量代数计算的局部性。我们为称为分支迭代图的张量操作开发了一个中间表示(IR),该操作指定计算分解为较小的计算(内核分布),然后将fuse(环融合)的最大维度(环路融合)分布,而最内在的维度则分布,以提高数据位置。我们描述了空间迭代空间,IR中的转换及其程序称为之间的中间结果的交换。最后,我们表明转换可用于优化稀疏张量的内核。我们的结果表明,与炸玉米饼生成的代码相比,这种新的转换显着提高了几种实际张量代数计算的性能。

Sparse tensor algebra computations have become important in many real-world applications like machine learning, scientific simulations, and data mining. Hence, automated code generation and performance optimizations for tensor algebra kernels are paramount. Recent advancements such as the Tensor Algebra Compiler (TACO) greatly generalize and automate the code generation for tensor algebra expressions. However, the code generated by TACO for many important tensor computations remains suboptimal due to the absence of a scheduling directive to support transformations such as distribution/fusion. This paper extends TACO's scheduling space to support kernel distribution/loop fusion in order to reduce asymptotic time complexity and improve locality of complex tensor algebra computations. We develop an intermediate representation (IR) for tensor operations called branched iteration graph which specifies breakdown of the computation into smaller ones (kernel distribution) and then fuse (loop fusion) outermost dimensions of the loop nests, while the innermost dimensions are distributed, to increase data locality. We describe exchanges of intermediate results between space iteration spaces, transformation in the IR, and its programmatic invocation. Finally, we show that the transformation can be used to optimize sparse tensor kernels. Our results show that this new transformation significantly improves the performance of several real-world tensor algebra computations compared to TACO-generated code.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源