论文标题

一般非参数回归的合奏预测追求

Ensemble Projection Pursuit for General Nonparametric Regression

论文作者

Zhan, Haoran, Zhang, Mingke, Xia, Yingcun

论文摘要

投影追求回归(PPR)在统计和机器学习的发展中发挥了重要作用。但是,与其他已建立的方法(例如随机森林(RF)和支持向量机(SVM))相比,PPR尚未展示与统计学习技术相似的准确性。在本文中,我们重新审视了PPR的估计,并提出了\ textit {optimal}贪婪算法和通过“功能包装”的合奏方法,以下称为EPPR,旨在提高功效。与RF相比,EPPR具有两个主要优点。首先,只要它们是$ l^2 $,并且可以实现更高的一致性率,就可以证明其理论一致性对于更一般的回归函数。其次,EPPR不会拆分样品,因此使用整个数据估算了PPR的每个项,从而使最小化更有效并保证了估计器的平滑度。基于实际数据集的广泛比较表明,与RF和其他竞争对手相比,EPPR在回归和分类方面更有效。人工神经网络(ANN)的变体EPPR的功效表明,通过合适的统计调整,ANN可以在处理中小型数据集时相等甚至超过RF。这一启示挑战了人们普遍认为,ANN对RF的优势仅限于处理广泛的样本量。

The projection pursuit regression (PPR) has played an important role in the development of statistics and machine learning. However, when compared to other established methods like random forests (RF) and support vector machines (SVM), PPR has yet to showcase a similar level of accuracy as a statistical learning technique. In this paper, we revisit the estimation of PPR and propose an \textit{optimal} greedy algorithm and an ensemble approach via "feature bagging", hereafter referred to as ePPR, aiming to improve the efficacy. Compared to RF, ePPR has two main advantages. Firstly, its theoretical consistency can be proved for more general regression functions as long as they are $L^2$ integrable, and higher consistency rates can be achieved. Secondly, ePPR does not split the samples, and thus each term of PPR is estimated using the whole data, making the minimization more efficient and guaranteeing the smoothness of the estimator. Extensive comparisons based on real data sets show that ePPR is more efficient in regression and classification than RF and other competitors. The efficacy of ePPR, a variant of Artificial Neural Networks (ANN), demonstrates that with suitable statistical tuning, ANN can equal or even exceed RF in dealing with small to medium-sized datasets. This revelation challenges the widespread belief that ANN's superiority over RF is limited to processing extensive sample sizes.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源