论文标题

域适应的底漆

A Primer on Domain Adaptation

论文作者

Lemberger, Pirmin, Panico, Ivan

论文摘要

标准监督的机器学习假设用于训练算法的源样本的分布与应在其上进行预测的目标样本之一相同。但是,正如任何数据科学家都将确认的那样,实际上并非如此。处理这种情况的统计和数值方法集被称为域适应性,这是一个具有悠久历史的领域。然而,无数的方法以及不幸的缺乏清晰且普遍接受的术语可能会使这个话题对新来者来说是令人生畏的。 Therefore, rather than aiming at completeness, which leads to exhibiting a tedious catalog of methods, this pedagogical review aims at a coherent presentation of four important special cases: (1) prior shift, a situation in which training samples were selected according to their labels without any knowledge of their actual distribution in the target, (2) covariate shift which deals with a situation where training examples were picked according to their features but with some selection bias, (3) concept shift where the标签对源和目标之间的特征防御能力的依赖性,最后但并非最不重要的(4)子空间映射,该映射处理目标中的特征在源特征方面遭受了未知的失真。在每种情况下,我们首先建立一个直觉,接下来,我们提供适当的数学框架,并最终描述一个实际应用。

Standard supervised machine learning assumes that the distribution of the source samples used to train an algorithm is the same as the one of the target samples on which it is supposed to make predictions. However, as any data scientist will confirm, this is hardly ever the case in practice. The set of statistical and numerical methods that deal with such situations is known as domain adaptation, a field with a long and rich history. The myriad of methods available and the unfortunate lack of a clear and universally accepted terminology can however make the topic rather daunting for the newcomer. Therefore, rather than aiming at completeness, which leads to exhibiting a tedious catalog of methods, this pedagogical review aims at a coherent presentation of four important special cases: (1) prior shift, a situation in which training samples were selected according to their labels without any knowledge of their actual distribution in the target, (2) covariate shift which deals with a situation where training examples were picked according to their features but with some selection bias, (3) concept shift where the dependence of the labels on the features defers between the source and the target, and last but not least (4) subspace mapping which deals with a situation where features in the target have been subjected to an unknown distortion with respect to the source features. In each case we first build an intuition, next we provide the appropriate mathematical framework and eventually we describe a practical application.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源