关于平衡无监督的多源域适应的偏见和差异

论文标题

关于平衡无监督的多源域适应的偏见和差异

On Balancing Bias and Variance in Unsupervised Multi-Source-Free Domain Adaptation

论文作者

Shen, Maohao, Bu, Yuheng, Wornell, Gregory

论文摘要

由于隐私，存储和其他约束，在机器学习中，不需要无监督的域适应技术的需求越来越不需要访问用于培训源模型的数据的数据。现有的无源域适应性方法（MSFDA）通常使用由源模型生成的伪标记的数据训练目标模型，该数据的重点是改善伪标记技术或提出新的培训目标。相反，我们旨在分析MSFDA的基本限制。特别是，我们开发了一种基于最终目标模型的概括错误的信息理论，该误差说明了固有的偏见变化权衡。然后，我们提供有关如何从三个角度平衡这一权衡的见解，包括域聚合，选择性伪标记和联合特征对齐，这会导致新算法的设计。多个数据集的实验验证了我们的理论分析，并证明了拟议算法的最新性能，尤其是在包括Office-Home和Domainnet在内的一些最具挑战性的数据集上。

Due to privacy, storage, and other constraints, there is a growing need for unsupervised domain adaptation techniques in machine learning that do not require access to the data used to train a collection of source models. Existing methods for multi-source-free domain adaptation (MSFDA) typically train a target model using pseudo-labeled data produced by the source models, which focus on improving the pseudo-labeling techniques or proposing new training objectives. Instead, we aim to analyze the fundamental limits of MSFDA. In particular, we develop an information-theoretic bound on the generalization error of the resulting target model, which illustrates an inherent bias-variance trade-off. We then provide insights on how to balance this trade-off from three perspectives, including domain aggregation, selective pseudo-labeling, and joint feature alignment, which leads to the design of novel algorithms. Experiments on multiple datasets validate our theoretical analysis and demonstrate the state-of-art performance of the proposed algorithm, especially on some of the most challenging datasets, including Office-Home and DomainNet.

下载PDF全文

下载文献需遵守相关版权规定

论文标题