沙拉：无源的主动标签 - 敏捷结构域适应用于分类，分割和检测

论文标题

沙拉：无源的主动标签 - 敏捷结构域适应用于分类，分割和检测

SALAD: Source-free Active Label-Agnostic Domain Adaptation for Classification, Segmentation and Detection

论文作者

Kothandaraman, Divya, Shekhar, Sumit, Sancheti, Abhilasha, Ghuhan, Manoj, Shukla, Tripti, Manocha, Dinesh

论文摘要

我们提出了一种新颖的方法，即沙拉，用于将预先训练的“源”域网络适应“目标”域的挑战性视觉任务，在“目标”域中注释的预算很小，标签空间的变化。此外，该任务假定由于隐私问题或其他方式，源数据无法适应。我们假设此类系统需要共同优化（i）从目标域中选择固定数量的样本以进行注释的双重任务，以及（ii）知识从预训练的网络转移到目标域。为此，沙拉由一个新型的有指导性的注意转移网络（GATN）和一个主动学习功能HAL组成。 GATN启用了从预训练网络到目标网络的特征蒸馏，并与HAL使用的转移性和不确定性标准相辅相成。沙拉有三个关键的好处：（i）它是任务不合时宜的，可以应用于各种视觉任务，例如分类，分割和检测；（ii）它可以处理从预训练的源网络到目标域的输出标签空间的变化；（iii）它不需要访问源数据进行适应。我们对3个视觉任务进行了广泛的实验，即。数字分类（MNIST，SVHN，VISDA），合成（GTA5）到真实（CityScapes）图像分段和文档布局检测（PublayNet to DSSE）。我们表明，我们的无源方法（沙拉）比先前的适应方法提高了0.5％-31.3％（跨数据集和任务），该方法假设访问大量带注释的源数据以进行适应。

We present a novel method, SALAD, for the challenging vision task of adapting a pre-trained "source" domain network to a "target" domain, with a small budget for annotation in the "target" domain and a shift in the label space. Further, the task assumes that the source data is not available for adaptation, due to privacy concerns or otherwise. We postulate that such systems need to jointly optimize the dual task of (i) selecting fixed number of samples from the target domain for annotation and (ii) transfer of knowledge from the pre-trained network to the target domain. To do this, SALAD consists of a novel Guided Attention Transfer Network (GATN) and an active learning function, HAL. The GATN enables feature distillation from pre-trained network to the target network, complemented with the target samples mined by HAL using transfer-ability and uncertainty criteria. SALAD has three key benefits: (i) it is task-agnostic, and can be applied across various visual tasks such as classification, segmentation and detection; (ii) it can handle shifts in output label space from the pre-trained source network to the target domain; (iii) it does not require access to source data for adaptation. We conduct extensive experiments across 3 visual tasks, viz. digits classification (MNIST, SVHN, VISDA), synthetic (GTA5) to real (CityScapes) image segmentation, and document layout detection (PubLayNet to DSSE). We show that our source-free approach, SALAD, results in an improvement of 0.5%-31.3%(across datasets and tasks) over prior adaptation methods that assume access to large amounts of annotated source data for adaptation.

下载PDF全文

下载文献需遵守相关版权规定

论文标题