论文标题

使用适当的初始估计值 - 低维真实数据中的遗嘱中挖掘出非负garrote

Exhuming nonnegative garrote from oblivion using suitable initial estimates- illustration in low and high-dimensional real data

论文作者

Kipruto, Edwin, Sauerbrei, Willi

论文摘要

非负garrote(NNG)是结合变量选择和回归估计收缩的最早方法之一。当更多的预测因子的推导之外,NNG比流行的套索具有一些概念上的优势。尽管如此,NNG几乎没有受到关注。原始NNG依赖于最小二乘(OLS)估计值,这些估计值在具有高度多重共线性(HDM)的数据中很大,并且在高维数据(HDD)中不存在。这可能是此类数据中未使用NNG的原因。已经提出了替代性初始估计,但在实践中几乎不使用。分析了三个结构上不同的数据集,我们证明了NNG也可以在HDM和HDD中应用,并将其性能与Lasso,Adaptive Lasso,Raseped Lasso和最佳子集相比,根据所选变量,回归估计和预测。用HDM和HDD中LASSO初始估计的Ridge最初估计取代OLS,与竞争方法相比,HDD中的初始估计有助于选择更简单的模型,而没有预测错误的增加。更简单的模型更容易解释,这是描述性建模的重要问题。根据三个数据集的有限经验,我们假设NNG可以是套索及其扩展的合适替代品。需要进行中性比较模拟研究,以更好地了解可变选择方法的特性,比较它们并获得实践指导。

The nonnegative garrote (NNG) is among the first approaches that combine variable selection and shrinkage of regression estimates. When more than the derivation of a predictor is of interest, NNG has some conceptual advantages over the popular lasso. Nevertheless, NNG has received little attention. The original NNG relies on least-squares (OLS) estimates, which are highly variable in data with a high degree of multicollinearity (HDM) and do not exist in high-dimensional data (HDD). This might be the reason that NNG is not used in such data. Alternative initial estimates have been proposed but hardly used in practice. Analyzing three structurally different data sets, we demonstrated that NNG can also be applied in HDM and HDD and compared its performance with the lasso, adaptive lasso, relaxed lasso, and best subset selection in terms of variables selected, regression estimates, and prediction. Replacing OLS by ridge initial estimates in HDM and lasso initial estimates in HDD helped NNG select simpler models than competing approaches without much increase in prediction errors. Simpler models are easier to interpret, an important issue for descriptive modelling. Based on the limited experience from three datasets, we assume that the NNG can be a suitable alternative to the lasso and its extensions. Neutral comparison simulation studies are needed to better understand the properties of variable selection methods, compare them and derive guidance for practice.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源