实例感知的模型集合，带有无监督域适应的蒸馏

论文标题

实例感知的模型集合，带有无监督域适应的蒸馏

Instance-aware Model Ensemble With Distillation For Unsupervised Domain Adaptation

论文作者

Wu, Weimin, Fan, Jiayuan, Chen, Tao, Ye, Hancheng, Zhang, Bo, Li, Baopu

论文摘要

已经提出了基于线性合奏的策略，即平均集合，以提高无监督域适应任务的性能。但是，典型的UDA任务通常受到动态变化的因素的挑战，例如无标记的目标域中的可变天气，视图和背景。以前的大多数合奏策略都忽略了UDA的动态和无法控制的挑战，面临着有限的功能表示和性能瓶颈。为了增强模型，在部署集成模型时，域之间的适应性并降低计算成本，我们提出了一个新型框架，即用蒸馏蒸馏的实例意识模型集成，IMED，该集合会根据不同的实例适应多个UDA组件模型，并将这些组件蒸馏成一个小模型。 IMED的核心思想是一个动态实例的集合策略，在每个实例中，学习了一个非线性融合子网络，该策略融合了提取的特征和多个组件模型的预测标签。非线性融合方法可以帮助集成模型处理动态变化的因素。在学习了具有良好适应能力的大型容量合奏模型之后，我们利用合奏教师模型来指导通过知识蒸馏学习紧凑的学生模型。此外，我们提供了IMED对UDA的有效性的理论分析。在各种UDA基准数据集上进行的广泛实验，例如Office 31，Office Home和Visda 2017，在可比的计算成本下显示了基于IMED对最新方法的模型的优越性。

The linear ensemble based strategy, i.e., averaging ensemble, has been proposed to improve the performance in unsupervised domain adaptation tasks. However, a typical UDA task is usually challenged by dynamically changing factors, such as variable weather, views, and background in the unlabeled target domain. Most previous ensemble strategies ignore UDA's dynamic and uncontrollable challenge, facing limited feature representations and performance bottlenecks. To enhance the model, adaptability between domains and reduce the computational cost when deploying the ensemble model, we propose a novel framework, namely Instance aware Model Ensemble With Distillation, IMED, which fuses multiple UDA component models adaptively according to different instances and distills these components into a small model. The core idea of IMED is a dynamic instance aware ensemble strategy, where for each instance, a nonlinear fusion subnetwork is learned that fuses the extracted features and predicted labels of multiple component models. The nonlinear fusion method can help the ensemble model handle dynamically changing factors. After learning a large capacity ensemble model with good adaptability to different changing factors, we leverage the ensemble teacher model to guide the learning of a compact student model by knowledge distillation. Furthermore, we provide the theoretical analysis of the validity of IMED for UDA. Extensive experiments conducted on various UDA benchmark datasets, e.g., Office 31, Office Home, and VisDA 2017, show the superiority of the model based on IMED to the state of the art methods under the comparable computation cost.

下载PDF全文

下载文献需遵守相关版权规定

论文标题