论文标题
切换方案:一种新颖的方法,用于处理现实世界数据集中的增量概念漂移
Switching Scheme: A Novel Approach for Handling Incremental Concept Drift in Real-World Data Sets
论文作者
论文摘要
如今,机器学习模型对于商业和行业中的许多应用都起着至关重要的作用。但是,模型仅一旦将其部署到生产中就开始添加价值。部署模型的一个挑战是随着时间的推移随着时间的流逝而改变数据的效果,这通常用术语概念漂移来描述。由于其性质,概念漂移会严重影响机器学习系统的预测性能。在这项工作中,我们在现实世界数据集的背景下分析了概念漂移的影响。为了进行有效的概念漂移处理,我们介绍了切换方案,该方案结合了机器学习模型的重新培训和更新的两个原则。此外,我们系统地分析了现有的常规适应以及触发适应策略。切换计划是在纽约市出租车数据上实例化的,随着时间的流逝,需求模式的影响很大。我们可以证明,开关方案的表现优于所有其他基线,并提供有希望的预测结果。
Machine learning models nowadays play a crucial role for many applications in business and industry. However, models only start adding value as soon as they are deployed into production. One challenge of deployed models is the effect of changing data over time, which is often described with the term concept drift. Due to their nature, concept drifts can severely affect the prediction performance of a machine learning system. In this work, we analyze the effects of concept drift in the context of a real-world data set. For efficient concept drift handling, we introduce the switching scheme which combines the two principles of retraining and updating of a machine learning model. Furthermore, we systematically analyze existing regular adaptation as well as triggered adaptation strategies. The switching scheme is instantiated on New York City taxi data, which is heavily influenced by changing demand patterns over time. We can show that the switching scheme outperforms all other baselines and delivers promising prediction results.