论文标题

使用低代码软件进行机器学习的挑战和障碍

Challenges and Barriers of Using Low Code Software for Machine Learning

论文作者

Alamin, Md Abdullah Al, Uddin, Gias

论文摘要

随着大数据在许多领域的无处不在,越来越多的利益相关者寻求在其数据上开发机器学习(ML)应用程序。 ML应用程序的成功通常取决于ML专家和领域专家的密切合作。但是,ML工程师的短缺仍然是一个基本问题。低代码机器学习工具/平台(又称汽车)旨在通过自动化ML管道中的许多重复任务来使ML开发民主化为领域专家。这项研究提出了一项对堆栈溢出(SO)的大约14k帖子(问题 +接受答案)的经验研究,其中包含与汽车相关的讨论。我们研究了这些主题是如何分布在各种机器学习生命周期(MLLC)阶段及其的受欢迎程度和困难的。这项研究提供了一些有趣的发现。首先,我们发现我们将13个Automl主题分为四个类别。 MLOPS主题类别(43%的问题)是最大的,其次是模型(28%问题),数据(27%问题),文档(2%问题)。其次,在模型培训(29%)(即实施阶段)和数据准备(25%)MLLC阶段提出了大多数问题。第三,自动从业人员发现MLOPS主题类别最具挑战性,尤其是与模型部署和监视以及自动化ML管道有关的主题。这些发现对所有三个汽车利益相关者都有影响:汽车研究人员,汽车服务供应商和汽车开发人员。学术和行业合作可以改善汽车的不同方面,例如更好的DevOps/部署支持和基于教程的文档。

As big data grows ubiquitous across many domains, more and more stakeholders seek to develop Machine Learning (ML) applications on their data. The success of an ML application usually depends on the close collaboration of ML experts and domain experts. However, the shortage of ML engineers remains a fundamental problem. Low-code Machine learning tools/platforms (aka, AutoML) aim to democratize ML development to domain experts by automating many repetitive tasks in the ML pipeline. This research presents an empirical study of around 14k posts (questions + accepted answers) from Stack Overflow (SO) that contained AutoML-related discussions. We examine how these topics are spread across the various Machine Learning Life Cycle (MLLC) phases and their popularity and difficulty. This study offers several interesting findings. First, we find 13 AutoML topics that we group into four categories. The MLOps topic category (43% questions) is the largest, followed by Model (28% questions), Data (27% questions), Documentation (2% questions). Second, Most questions are asked during Model training (29%) (i.e., implementation phase) and Data preparation (25%) MLLC phase. Third, AutoML practitioners find the MLOps topic category most challenging, especially topics related to model deployment & monitoring and Automated ML pipeline. These findings have implications for all three AutoML stakeholders: AutoML researchers, AutoML service vendors, and AutoML developers. Academia and Industry collaboration can improve different aspects of AutoML, such as better DevOps/deployment support and tutorial-based documentation.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源