论文标题

决策树上的单个MCMC链并行

Single MCMC Chain Parallelisation on Decision Trees

论文作者

Drousiotis, Efthyvoulos, Spirakis, Paul G.

论文摘要

决策树在机器学习中是备受著名的,通常会获得最新的性能。尽管如此,诸如CART,ID3,随机森林和Boost树之类的知名变体错过了一个概率版本,该版本编码了有关树结构的先前假设,并在节点参数之间共享统计强度。贝叶斯决策树上的现有工作取决于马尔可夫链蒙特卡洛(MCMC),这可能在计算上很慢,尤其是在高维数据和昂贵的建议上。在这项研究中,我们提出了一种在普通笔记本电脑或个人计算机上平行单个MCMC决策树链的方法,使我们能够通过多核处理减少其运行时间,而结果在统计上与常规顺序实现相同。我们还计算了运行时间的理论和实际减少,可以利用我们的多处理器体系结构的方法获得。实验表明,只要串行和并行实现在统计学上相同,我们可以达到18倍的运行时间。

Decision trees are highly famous in machine learning and usually acquire state-of-the-art performance. Despite that, well-known variants like CART, ID3, random forest, and boosted trees miss a probabilistic version that encodes prior assumptions about tree structures and shares statistical strength between node parameters. Existing work on Bayesian decision trees depend on Markov Chain Monte Carlo (MCMC), which can be computationally slow, especially on high dimensional data and expensive proposals. In this study, we propose a method to parallelise a single MCMC decision tree chain on an average laptop or personal computer that enables us to reduce its run-time through multi-core processing while the results are statistically identical to conventional sequential implementation. We also calculate the theoretical and practical reduction in run time, which can be obtained utilising our method on multi-processor architectures. Experiments showed that we could achieve 18 times faster running time provided that the serial and the parallel implementation are statistically identical.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源