论文标题

通过软路线有效地种植深森林,并学到了连通性

Growing Deep Forests Efficiently with Soft Routing and Learned Connectivity

论文作者

Shen, Jianghao, Wang, Sicheng, Wang, Zhangyang

论文摘要

尽管深度神经网络(DNNS)取得了最新的成功,但对它们的使用提出了一些问题,包括缺乏可无力的DNN与其他公认的机器学习模型之间的差距以及越来越昂贵的计算成本。许多最近的作品[1],[2],[3]探索了以纯粹的馈送方式依次堆叠决策树/随机森林构建块的替代方法,而无需背部传播。由于决策树具有固有的推理透明度,因此这种深层森林模型还可以促进对内部决策过程的理解。本文在几个重要方面进一步扩展了深层森林的思想。首先,我们采用概率树,其节点做出概率路由决策,也就是软路由,而不是硬性二进制决策。Besides增强了灵活性,它还可以对每棵树进行非怪兽优化。其次,我们提出了一种创新的拓扑学习策略:REE中的每个节点现在都保持一个新的可学习的超参数,表明它将是叶子节点的概率。这样,树将在训练过程中共同优化其参数和树拓扑。 MNIST数据集的实验表明,与[1],[3]相比,我们的授权深森林可以取得更好或可比的性能,并且模型的复杂性大大降低。例如,我们只有15棵树的1层的模型可以与[3]中的模型相当地执行,每层有2层2000棵树。

Despite the latest prevailing success of deep neural networks (DNNs), several concerns have been raised against their usage, including the lack of intepretability the gap between DNNs and other well-established machine learning models, and the growingly expensive computational costs. A number of recent works [1], [2], [3] explored the alternative to sequentially stacking decision tree/random forest building blocks in a purely feed-forward way, with no need of back propagation. Since decision trees enjoy inherent reasoning transparency, such deep forest models can also facilitate the understanding of the internaldecision making process. This paper further extends the deep forest idea in several important aspects. Firstly, we employ a probabilistic tree whose nodes make probabilistic routing decisions, a.k.a., soft routing, rather than hard binary decisions.Besides enhancing the flexibility, it also enables non-greedy optimization for each tree. Second, we propose an innovative topology learning strategy: every node in the ree now maintains a new learnable hyperparameter indicating the probability that it will be a leaf node. In that way, the tree will jointly optimize both its parameters and the tree topology during training. Experiments on the MNIST dataset demonstrate that our empowered deep forests can achieve better or comparable performance than [1],[3] , with dramatically reduced model complexity. For example,our model with only 1 layer of 15 trees can perform comparably with the model in [3] with 2 layers of 2000 trees each.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源