论文标题
广义且可扩展的最佳稀疏决策树
Generalized and Scalable Optimal Sparse Decision Trees
论文作者
论文摘要
从计算角度来看,决策树优化是困难的,但对于可解释的机器学习领域至关重要。尽管在过去40年中做出了努力,但直到最近才进行了优化的突破,这些突破允许实用算法找到最佳的决策树。这些新技术有可能触发范式变化,在那里可以构建稀疏的决策树以有效地优化各种目标函数,而无需依靠贪婪的分裂和修剪启发式方法,而启发式方法通常会导致次优溶液。这项工作的贡献是为决策树优化提供一个一般框架,该框架解决该领域的两个重要开放问题:处理不平衡的数据并在连续变量上完全优化。我们提出了在ROC凸船体下的F-Score,AUC和部分区域在内的各种目标上产生最佳决策树的技术。我们还引入了一种可扩展的算法,该算法在存在连续变量的情况下可产生最佳的结果,并将决策树构造加快相对于最先进的数量级。
Decision tree optimization is notoriously difficult from a computational perspective but essential for the field of interpretable machine learning. Despite efforts over the past 40 years, only recently have optimization breakthroughs been made that have allowed practical algorithms to find optimal decision trees. These new techniques have the potential to trigger a paradigm shift where it is possible to construct sparse decision trees to efficiently optimize a variety of objective functions without relying on greedy splitting and pruning heuristics that often lead to suboptimal solutions. The contribution in this work is to provide a general framework for decision tree optimization that addresses the two significant open problems in the area: treatment of imbalanced data and fully optimizing over continuous variables. We present techniques that produce optimal decision trees over a variety of objectives including F-score, AUC, and partial area under the ROC convex hull. We also introduce a scalable algorithm that produces provably optimal results in the presence of continuous variables and speeds up decision tree construction by several orders of magnitude relative to the state-of-the art.