论文标题
部分可观测时空混沌系统的无模型预测
Improving Long-tailed Object Detection with Image-Level Supervision by Multi-Task Collaborative Learning
论文作者
论文摘要
现实世界中对象检测中的数据通常表现出长尾分布。现有解决方案通过减轻头部和尾部类别之间的竞争来解决此问题。但是,由于训练样本的稀缺性,尾巴类别仍然无法学习歧视性表示。将更多数据带入培训可能会减轻问题,但是收集实例级注释是一项艰巨的任务。相比之下,图像级注释易于访问,但不能完全利用。在本文中,我们提出了一个新颖的框架CLI(与图像级监督的多任务协作学习),该框架利用图像级的监督以多任务协作方式增强检测能力。具体而言,在我们的框架中有一个对象检测任务(由实例分类任务和本地化任务组成)和图像分类任务,负责利用两种类型的监督。通过三个关键设计对不同的任务进行了协作培训:(1)任务特有的子网络,这些子网学习不同任务的特定表示,而无需功能纠缠。 (2)用于图像分类任务的暹罗子网络与实例分类任务共享其知识,从而导致探测器的功能丰富。 (3)维持表示一致性,桥接不同监督的特征差距的对比学习正则化。在具有挑战性的LVIS数据集上进行了广泛的实验。如果没有复杂的损失工程,CLI的总体AP为31.1,在尾巴类别上提高了10.1点,建立了新的最先进。代码将在https://github.com/waveboo/clis上。
Data in real-world object detection often exhibits the long-tailed distribution. Existing solutions tackle this problem by mitigating the competition between the head and tail categories. However, due to the scarcity of training samples, tail categories are still unable to learn discriminative representations. Bringing more data into the training may alleviate the problem, but collecting instance-level annotations is an excruciating task. In contrast, image-level annotations are easily accessible but not fully exploited. In this paper, we propose a novel framework CLIS (multi-task Collaborative Learning with Image-level Supervision), which leverage image-level supervision to enhance the detection ability in a multi-task collaborative way. Specifically, there are an object detection task (consisting of an instance-classification task and a localization task) and an image-classification task in our framework, responsible for utilizing the two types of supervision. Different tasks are trained collaboratively by three key designs: (1) task-specialized sub-networks that learn specific representations of different tasks without feature entanglement. (2) a siamese sub-network for the image-classification task that shares its knowledge with the instance-classification task, resulting in feature enrichment of detectors. (3) a contrastive learning regularization that maintains representation consistency, bridging feature gaps of different supervision. Extensive experiments are conducted on the challenging LVIS dataset. Without sophisticated loss engineering, CLIS achieves an overall AP of 31.1 with 10.1 point improvement on tail categories, establishing a new state-of-the-art. Code will be at https://github.com/waveboo/CLIS.