用于统一机器学习预测服务的张量编译器

论文标题

用于统一机器学习预测服务的张量编译器

A Tensor Compiler for Unified Machine Learning Prediction Serving

论文作者

Nakandala, Supun, Saur, Karla, Yu, Gyeong-In, Karanasos, Konstantinos, Curino, Carlo, Weimer, Markus, Interlandi, Matteo

论文摘要

企业中的机器学习（ML）采用需要更简单，更有效的软件基础架构 - 大型网络公司中典型的定制解决方案根本难以维持。模型评分，从训练有素的模型上获得预测的过程是基础架构复杂性和成本的主要因素，因为训练了一次，但使用了很多次。在本文中，我们提出了蜂鸟，这是一种新颖的模型评分方法，它将特征运算符和传统的ML模型（例如决策树）编译为一小部分张量操作。这种方法固有地降低了基础架构的复杂性，并直接利用神经网络编译器和运行时间的现有投资来为CPU和硬件加速器生成有效的计算。我们的性能结果令人着迷：尽管用张量计算抽象取代了命令式计算（例如，树木遍历），但Hummingbird还是有竞争力的，并且通常在CPU和GPU上的微基础测试上胜过手工制作的内核，同时启用ML速度型ML的无缝端端端端。我们已经发布了蜂鸟作为开源。

Machine Learning (ML) adoption in the enterprise requires simpler and more efficient software infrastructure---the bespoke solutions typical in large web companies are simply untenable. Model scoring, the process of obtaining predictions from a trained model over new data, is a primary contributor to infrastructure complexity and cost as models are trained once but used many times. In this paper we propose HUMMINGBIRD, a novel approach to model scoring, which compiles featurization operators and traditional ML models (e.g., decision trees) into a small set of tensor operations. This approach inherently reduces infrastructure complexity and directly leverages existing investments in Neural Network compilers and runtimes to generate efficient computations for both CPU and hardware accelerators. Our performance results are intriguing: despite replacing imperative computations (e.g., tree traversals) with tensor computation abstractions, HUMMINGBIRD is competitive and often outperforms hand-crafted kernels on micro-benchmarks on both CPU and GPU, while enabling seamless end-to-end acceleration of ML pipelines. We have released HUMMINGBIRD as open source.

下载PDF全文

下载文献需遵守相关版权规定

论文标题