论文标题

Amazon Sagemaker Autopilot:一个大规模的白盒汽车解决方案

Amazon SageMaker Autopilot: a white box AutoML solution at scale

论文作者

Das, Piali, Perrone, Valerio, Ivkin, Nikita, Bansal, Tanya, Karnin, Zohar, Shen, Huibin, Shcherbatyi, Iaroslav, Elor, Yotam, Wu, Wilton, Zolic, Aida, Lienart, Thibaut, Tang, Alex, Ahmed, Amr, Faddoul, Jean Baptiste, Jenatton, Rodolphe, Winkelmolen, Fela, Gautier, Philip, Dirac, Leo, Perunicic, Andre, Miladinovic, Miroslav, Zappella, Giovanni, Archambeau, Cédric, Seeger, Matthias, Dutt, Bhaskar, Rouesnel, Laurence

论文摘要

AutoML系统通过选择正确的处理功能,选择算法并调整整个管道的超参数来为机器学习问题提供黑框解决方案。尽管这些系统在许多数据集上的性能都很好,但仍然存在不可忽略的数据集数量,每个数据集为每个数据集提供了每个特定系统产生的单发解决方案,将提供低于PAR的性能。在本文中,我们介绍了Amazon Sagemaker Autopilot:一个完全管理的系统,该系统提供了自动化的ML解决方案,可以在需要时修改。给定一个表格数据集和目标列名称,AutoPilot识别问题类型,分析数据并生成各种完整的ML管道集,包括功能预处理和ML算法,这些算法被调整为生成候选模型的排行榜。在性能不满意的情况下,数据科学家能够查看和编辑提出的ML管道,以便注入其专业知识和商业知识,而无需恢复到完全手动的解决方案。本文介绍了自动驾驶仪的不同组件,强调了允许可扩展性,高质量模型,可编辑的ML管道,离线元学习的伪像的消费以及与整个SageMaker Suite的方便集成,从而可以在生产环境中使用这些训练的模型。

AutoML systems provide a black-box solution to machine learning problems by selecting the right way of processing features, choosing an algorithm and tuning the hyperparameters of the entire pipeline. Although these systems perform well on many datasets, there is still a non-negligible number of datasets for which the one-shot solution produced by each particular system would provide sub-par performance. In this paper, we present Amazon SageMaker Autopilot: a fully managed system providing an automated ML solution that can be modified when needed. Given a tabular dataset and the target column name, Autopilot identifies the problem type, analyzes the data and produces a diverse set of complete ML pipelines including feature preprocessing and ML algorithms, which are tuned to generate a leaderboard of candidate models. In the scenario where the performance is not satisfactory, a data scientist is able to view and edit the proposed ML pipelines in order to infuse their expertise and business knowledge without having to revert to a fully manual solution. This paper describes the different components of Autopilot, emphasizing the infrastructure choices that allow scalability, high quality models, editable ML pipelines, consumption of artifacts of offline meta-learning, and a convenient integration with the entire SageMaker suite allowing these trained models to be used in a production setting.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源