论文标题
Spottune:利用瞬态资源在公共云中进行成本效益的超参数调整
SpotTune: Leveraging Transient Resources for Cost-efficient Hyper-parameter Tuning in the Public Cloud
论文作者
论文摘要
高参数调整(HPT)对于许多机器学习(ML)算法至关重要。但是由于搜索空间较大,HPT通常是耗时的和资源密集的。如今,许多研究人员使用公共云资源来训练机器学习模型,方便但昂贵。如何加快HPT流程,同时降低成本对于云ML用户非常重要。在本文中,我们提出了Spottune,这种方法通过一些量身定制的策略来利用公共云中的瞬时可撤销资源,以平行且具有成本效益的方式进行HPT。 Spottune在瞬态服务器上编排HPT流程,使用两种主要技术,精细的成本吸引资源提供以及ML培训趋势预测,以降低HPT流程的货币成本和运行时间。我们的评估表明,Spottune可以将成本降低多达90%,并提高16.61倍的性能率。
Hyper-parameter tuning (HPT) is crucial for many machine learning (ML) algorithms. But due to the large searching space, HPT is usually time-consuming and resource-intensive. Nowadays, many researchers use public cloud resources to train machine learning models, convenient yet expensive. How to speed up the HPT process while at the same time reduce cost is very important for cloud ML users. In this paper, we propose SpotTune, an approach that exploits transient revocable resources in the public cloud with some tailored strategies to do HPT in a parallel and cost-efficient manner. Orchestrating the HPT process upon transient servers, SpotTune uses two main techniques, fine-grained cost-aware resource provisioning, and ML training trend predicting, to reduce the monetary cost and runtime of HPT processes. Our evaluations show that SpotTune can reduce the cost by up to 90% and achieve a 16.61x performance-cost rate improvement.