从多域中学习的域不变的蒙版自动编码器，用于自我监督的学习

论文标题

从多域中学习的域不变的蒙版自动编码器，用于自我监督的学习

Domain Invariant Masked Autoencoders for Self-supervised Learning from Multi-domains

论文作者

Yang, Haiyang, Chen, Meilin, Wang, Yizhou, Tang, Shixiang, Zhu, Feng, Bai, Lei, Zhao, Rui, Ouyang, Wanli

论文摘要

在显着不同的视觉域上概括学到的表示是人类视觉系统的基本而至关重要的能力。尽管最近的自我监督学习方法在与训练集相同的领域设置的评估中取得了良好的表现，但在对其他领域进行测试时，它们的性能会下降。因此，提出了从多个领域任务的自我监督学习来学习域不变特征，这些特征不仅适合于与训练集在同一领域进行评估，而且可以被推广到看不见的域。在本文中，我们建议从多域中自我监督的胶面膜自动编码器（DIMAE）学习，该学习设计了新的借口任务，\ emph {i.e。核心想法是通过来自不同域的样式噪声来增强输入图像，然后从增强图像的嵌入中重建图像，并将编码器正规化以学习域不变特征。为了实现这个想法，Dimae包含两个关键设计，1）内容保存的样式混合物，这些混合物将来自其他域的样式信息添加到输入中，同时以无参数方式持续持续存在内容，以及2）多个域特异性解码器，将相应的域输入样式恢复到编码的域Indemiant特征以进行重新构造。 PAC和域内的实验表明，与最近的最新方法相比，Dimae取得了可观的增长。

Generalizing learned representations across significantly different visual domains is a fundamental yet crucial ability of the human visual system. While recent self-supervised learning methods have achieved good performances with evaluation set on the same domain as the training set, they will have an undesirable performance decrease when tested on a different domain. Therefore, the self-supervised learning from multiple domains task is proposed to learn domain-invariant features that are not only suitable for evaluation on the same domain as the training set but also can be generalized to unseen domains. In this paper, we propose a Domain-invariant Masked AutoEncoder (DiMAE) for self-supervised learning from multi-domains, which designs a new pretext task, \emph{i.e.,} the cross-domain reconstruction task, to learn domain-invariant features. The core idea is to augment the input image with style noise from different domains and then reconstruct the image from the embedding of the augmented image, regularizing the encoder to learn domain-invariant features. To accomplish the idea, DiMAE contains two critical designs, 1) content-preserved style mix, which adds style information from other domains to input while persevering the content in a parameter-free manner, and 2) multiple domain-specific decoders, which recovers the corresponding domain style of input to the encoded domain-invariant features for reconstruction. Experiments on PACS and DomainNet illustrate that DiMAE achieves considerable gains compared with recent state-of-the-art methods.

下载PDF全文

下载文献需遵守相关版权规定

论文标题