论文标题
大规模培训和服务机器学习模型
Training and Serving Machine Learning Models at Scale
论文作者
论文摘要
近年来,由于依赖机器学习(ML)的组件的集成,Web服务变得越来越聪明(例如,在了解用户偏好时)。在用户可以与基于ML的服务(ML-Service)进行交互(推理阶段)之前,基础ML模型必须从现有数据中学习(培训阶段),这一过程需要持久的批处理计算。这两个不同阶段的管理很复杂,而手动方法几乎无法满足时间和质量要求。 本文重点介绍了在培训和推理模式下管理ML服务的一些主要问题,并提出了一些能够使用最少用户输入来满足设定要求的初始解决方案。初步评估表明,我们的解决方案使这些系统在其响应时间和准确性方面变得更加有效和可预测。
In recent years, Web services are becoming more and more intelligent (e.g., in understanding user preferences) thanks to the integration of components that rely on Machine Learning (ML). Before users can interact (inference phase) with an ML-based service (ML-Service), the underlying ML model must learn (training phase) from existing data, a process that requires long-lasting batch computations. The management of these two, diverse phases is complex and meeting time and quality requirements can hardly be done with manual approaches. This paper highlights some of the major issues in managing ML-services in both training and inference modes and presents some initial solutions that are able to meet set requirements with minimum user inputs. A preliminary evaluation demonstrates that our solutions allow these systems to become more efficient and predictable with respect to their response time and accuracy.