论文标题
结构因果模型下的域适应
Domain adaptation under structural causal models
论文作者
论文摘要
当用于训练模型的源数据与用于测试模型的目标数据时,域名的域自适应(DA)是统计机器学习中的重要问题。 DA的最新进展主要是由应用程序驱动的,并且很大程度上依赖于源和目标数据的共同子空间的想法。为了了解DA方法的经验成功和失败,我们通过结构性因果模型提出了一个理论框架,以实现DA方法的预测性能的分析和比较。该框架还允许我们逐项列出DA方法具有较低目标误差所需的假设。此外,有了我们的理论见解,我们提出了一种新的DA方法,称为CIRM,当目标数据中的协变量和标签分布都受到干扰时,均优于现有的DA方法。我们通过广泛的模拟对理论分析进行补充,以显示所设计的假设的必要性。还提供了可再现的合成和真实数据实验,以说明当违反我们理论中的某些假设时,DA方法的优势和劣势。
Domain adaptation (DA) arises as an important problem in statistical machine learning when the source data used to train a model is different from the target data used to test the model. Recent advances in DA have mainly been application-driven and have largely relied on the idea of a common subspace for source and target data. To understand the empirical successes and failures of DA methods, we propose a theoretical framework via structural causal models that enables analysis and comparison of the prediction performance of DA methods. This framework also allows us to itemize the assumptions needed for the DA methods to have a low target error. Additionally, with insights from our theory, we propose a new DA method called CIRM that outperforms existing DA methods when both the covariates and label distributions are perturbed in the target data. We complement the theoretical analysis with extensive simulations to show the necessity of the devised assumptions. Reproducible synthetic and real data experiments are also provided to illustrate the strengths and weaknesses of DA methods when parts of the assumptions in our theory are violated.