在协变量偏移下线性回归预审计预处理的功率和限制

论文标题

在协变量偏移下线性回归预审计预处理的功率和限制

The Power and Limitation of Pretraining-Finetuning for Linear Regression under Covariate Shift

论文作者

Wu, Jingfeng, Zou, Difan, Braverman, Vladimir, Gu, Quanquan, Kakade, Sham M.

论文摘要

我们研究了协变量偏移下的线性回归，其中输入协变量的边际分布在源和目标域上有所不同，而在两个域中，给定输入协变量的输出的条件分布相似。我们根据针对此问题的目标数据（均由在线SGD进行的目标数据（均由在线SGD进行）进行预处理研究转移学习方法。我们为这种方法建立了鲜明的实例依赖性多余风险上限和下限。我们的界限表明，对于大量的线性回归实例，使用$ O（n^2）$源数据（以及稀缺或无目标数据）转移学习与使用$ n $目标数据的监督学习一样有效。此外，我们表明，即使只有少量的目标数据，也可能会大大减少预处理所需的源数据量。我们的理论阐明了预训练的有效性和局限性以及对解决协变量转移问题的填补的好处。

We study linear regression under covariate shift, where the marginal distribution over the input covariates differs in the source and the target domains, while the conditional distribution of the output given the input covariates is similar across the two domains. We investigate a transfer learning approach with pretraining on the source data and finetuning based on the target data (both conducted by online SGD) for this problem. We establish sharp instance-dependent excess risk upper and lower bounds for this approach. Our bounds suggest that for a large class of linear regression instances, transfer learning with $O(N^2)$ source data (and scarce or no target data) is as effective as supervised learning with $N$ target data. In addition, we show that finetuning, even with only a small amount of target data, could drastically reduce the amount of source data required by pretraining. Our theory sheds light on the effectiveness and limitation of pretraining as well as the benefits of finetuning for tackling covariate shift problems.

下载PDF全文

下载文献需遵守相关版权规定

论文标题