有效的样本量，维度和协变量适应性的概括

论文标题

有效的样本量，维度和协变量适应性的概括

Effective Sample Size, Dimensionality, and Generalization in Covariate Shift Adaptation

论文作者

Polo, Felipe Maia, Vicente, Renato

论文摘要

在监督的学习中，培训和测试数据集通常是从不同的分布中取样的。因此，需要域的适应技术。当域仅因特征的边际分布而差异时，协变性移动适应会产生良好的概括性能。通常使用重要的加权来实施协方差的适应，这可能会根据较小的有效样本量（ESS）而失败。先前的研究表明，这种情况在高维环境中更为普遍。但是，在文献中，考虑到协方差适应的背景，在监督学习中，有效的样本量，维度和模型性能/概括在监督学习中正式相关，在文献中仍然有些晦涩。因此，一个主要的挑战是提出连接这些观点的统一理论。因此，在本文中，我们专注于在协方差改编的背景下建立连接ESS，数据维度和概括的统一视图。此外，我们还展示了降低维度的降低或特征选择如何增加ESS，并认为我们的结果支持降低维度在协变性转移适应之前的良好实践。

In supervised learning, training and test datasets are often sampled from distinct distributions. Domain adaptation techniques are thus required. Covariate shift adaptation yields good generalization performance when domains differ only by the marginal distribution of features. Covariate shift adaptation is usually implemented using importance weighting, which may fail, according to common wisdom, due to small effective sample sizes (ESS). Previous research argues this scenario is more common in high-dimensional settings. However, how effective sample size, dimensionality, and model performance/generalization are formally related in supervised learning, considering the context of covariate shift adaptation, is still somewhat obscure in the literature. Thus, a main challenge is presenting a unified theory connecting those points. Hence, in this paper, we focus on building a unified view connecting the ESS, data dimensionality, and generalization in the context of covariate shift adaptation. Moreover, we also demonstrate how dimensionality reduction or feature selection can increase the ESS, and argue that our results support dimensionality reduction before covariate shift adaptation as a good practice.

下载PDF全文

下载文献需遵守相关版权规定

论文标题