DualCoop：快速适应多标签识别，注释有限

论文标题

DualCoop：快速适应多标签识别，注释有限

DualCoOp: Fast Adaptation to Multi-Label Recognition with Limited Annotations

论文作者

Sun, Ximeng, Hu, Ping, Saenko, Kate

论文摘要

在低标签制度中，求解多标签识别（MLR）是许多现实世界应用的一项艰巨的任务。最近的工作学会了文本和视觉空间之间的一致性，以补偿图像标签不足，但由于可用的MLR注释量有限，因此失去了准确性。在这项工作中，我们利用数百万辅助图像文本对预测的文本和视觉特征的牢固对齐，并提出双背景优化（dualCoop）作为部分标签MLR和零摄影MLR的统一框架。 DualCoop用班级名称作为语言输入的一部分（即提示）编码正面和负面上下文。由于DualCoop仅在验证的视觉语言框架上引入了非常轻松的开销，因此它可以迅速适应具有有限的注释甚至看不见的类别的多标签识别任务。在两个具有挑战性的低标签设置上进行标准多标签识别基准测试的实验证明了我们方法比最先进的方法的优势。

Solving multi-label recognition (MLR) for images in the low-label regime is a challenging task with many real-world applications. Recent work learns an alignment between textual and visual spaces to compensate for insufficient image labels, but loses accuracy because of the limited amount of available MLR annotations. In this work, we utilize the strong alignment of textual and visual features pretrained with millions of auxiliary image-text pairs and propose Dual Context Optimization (DualCoOp) as a unified framework for partial-label MLR and zero-shot MLR. DualCoOp encodes positive and negative contexts with class names as part of the linguistic input (i.e. prompts). Since DualCoOp only introduces a very light learnable overhead upon the pretrained vision-language framework, it can quickly adapt to multi-label recognition tasks that have limited annotations and even unseen classes. Experiments on standard multi-label recognition benchmarks across two challenging low-label settings demonstrate the advantages of our approach over state-of-the-art methods.

下载PDF全文

下载文献需遵守相关版权规定

论文标题