模块化回归：通过合并辅助数据来改善线性模型

论文标题

模块化回归：通过合并辅助数据来改善线性模型

Modular Regression: Improving Linear Models by Incorporating Auxiliary Data

论文作者

Jin, Ying, Rothenhäusler, Dominik

论文摘要

本文在线性模型的训练过程中开发了一个新框架，称为模块化回归，以利用辅助信息（例如原始功能或其他数据集）。在高级别上，我们的方法遵循例程：（i）将回归任务分解为多个子任务，（ii）拟合子任务模型，（iii）使用子任务模型为原始回归问题提供了改进的估计。该例程适用于广泛使用的低维（广义）线性模型和高维正规线性回归。它也自然地扩展到只有部分观察值的缺失数据设置。通过合并辅助信息，我们的方法可以提高线性回归或在有条件独立性假设下进行线性回归或套索的预测准确性，以预测结果。对于高维设置，我们开发了对程序的扩展，这对违反有条件独立性假设的行为是可靠的，从某种意义上说，如果这种假设成立并与Lasso相吻合，则可以提高效率。我们通过模拟和真实的数据集证明了我们方法的功效。

This paper develops a new framework, called modular regression, to utilize auxiliary information -- such as variables other than the original features or additional data sets -- in the training process of linear models. At a high level, our method follows the routine: (i) decomposing the regression task into several sub-tasks, (ii) fitting the sub-task models, and (iii) using the sub-task models to provide an improved estimate for the original regression problem. This routine applies to widely-used low-dimensional (generalized) linear models and high-dimensional regularized linear regression. It also naturally extends to missing-data settings where only partial observations are available. By incorporating auxiliary information, our approach improves the estimation efficiency and prediction accuracy upon linear regression or the Lasso under a conditional independence assumption for predicting the outcome. For high-dimensional settings, we develop an extension of our procedure that is robust to violations of the conditional independence assumption, in the sense that it improves efficiency if this assumption holds and coincides with the Lasso otherwise. We demonstrate the efficacy of our methods with simulated and real data sets.

下载PDF全文

下载文献需遵守相关版权规定

论文标题