论文标题
源数据选择范围概括
Source data selection for out-of-domain generalization
论文作者
论文摘要
从异质源数据中借用知识并将其应用于相关但独特的目标任务的模型。事实证明,转移学习可以有效地在许多应用中实现这一概括。但是,对源数据集的选择差会导致目标的性能不佳,这一现象称为负转移。为了充分利用可用源数据,此工作研究源数据选择了目标任务。我们提出了两种基于多伴侣理论和随机搜索的源选择方法。我们对模拟和真实数据进行了彻底的经验评估。我们的建议还可以看作是存在重新加权源子样本的诊断,该子量比随机选择可用的样本更好。
Models that perform out-of-domain generalization borrow knowledge from heterogeneous source data and apply it to a related but distinct target task. Transfer learning has proven effective for accomplishing this generalization in many applications. However, poor selection of a source dataset can lead to poor performance on the target, a phenomenon called negative transfer. In order to take full advantage of available source data, this work studies source data selection with respect to a target task. We propose two source selection methods that are based on the multi-bandit theory and random search, respectively. We conduct a thorough empirical evaluation on both simulated and real data. Our proposals can be also viewed as diagnostics for the existence of a reweighted source subsamples that perform better than the random selection of available samples.