论文标题
sketchboost:用于多输出问题的快速梯度提升决策树
SketchBoost: Fast Gradient Boosted Decision Tree for Multioutput Problems
论文作者
论文摘要
梯度增强决策树(GBDT)是一种广泛使用的机器学习算法,已证明可以在许多标准数据科学问题上实现最先进的结果。当输出高度多维时,我们对其在多输出问题中的应用感兴趣。尽管有非常有效的GBDT实现,但它们对此类问题的可扩展性仍然不令人满意。在本文中,我们提出了旨在在多输出方案中加速GBDT训练过程的新方法。这些方法背后的想法在于用于查找最佳决策树的评分函数的近似计算。这些方法是在Sketchboost中实现的,该方法本身已集成到我们易于自定义的GBDT的GBDT实现中,称为Py-Boost。我们的数值研究表明,绘图板的速度使GBDT的训练过程加快了40次以上,同时实现了可比性甚至更好的性能。
Gradient Boosted Decision Tree (GBDT) is a widely-used machine learning algorithm that has been shown to achieve state-of-the-art results on many standard data science problems. We are interested in its application to multioutput problems when the output is highly multidimensional. Although there are highly effective GBDT implementations, their scalability to such problems is still unsatisfactory. In this paper, we propose novel methods aiming to accelerate the training process of GBDT in the multioutput scenario. The idea behind these methods lies in the approximate computation of a scoring function used to find the best split of decision trees. These methods are implemented in SketchBoost, which itself is integrated into our easily customizable Python-based GPU implementation of GBDT called Py-Boost. Our numerical study demonstrates that SketchBoost speeds up the training process of GBDT by up to over 40 times while achieving comparable or even better performance.