长尾视觉识别的基于缺陷的方法

论文标题

长尾视觉识别的基于缺陷的方法

Class-Difficulty Based Methods for Long-Tailed Visual Recognition

论文作者

Sinha, Saptarshi, Ohashi, Hiroki, Nakamura, Katsuyuki

论文摘要

与其他类别（称为少数或尾巴类）相比，很少的类或类别（称为多数或头等类别的类别）具有更高的数据样本数量，在现实世界中，长尾数据集经常遇到。在此类数据集上培训深层神经网络会给质量类带来偏见。到目前为止，研究人员提出了多种加权损失和数据重新采样技术，以减少偏见。但是，大多数此类技术都认为尾部类始终是最难学习的课程，因此需要更多的重量或关注。在这里，我们认为该假设可能并不总是成立的。因此，我们提出了一种新颖的方法，可以在模型的训练阶段动态测量每个类别的瞬时难度。此外，我们使用每个类别的难度度量来设计一种新型的加权损耗技术，称为“基于阶级难度的加权（CDB-W）损耗”，以及一种称为“基于类别难度的采样（CDB-S）”的新型数据采样技术。为了验证CDB方法的广泛可用性，我们对诸如图像分类，对象检测，实例分割和视频行动分类等多个任务进行了广泛的实验。结果验证了CDB-W损失和CDB-S可以在许多类似于现实世界中用例的类别不平衡数据集（例如Imagenet-LT，LVIS和EGTEA）上实现最先进的结果。

Long-tailed datasets are very frequently encountered in real-world use cases where few classes or categories (known as majority or head classes) have higher number of data samples compared to the other classes (known as minority or tail classes). Training deep neural networks on such datasets gives results biased towards the head classes. So far, researchers have come up with multiple weighted loss and data re-sampling techniques in efforts to reduce the bias. However, most of such techniques assume that the tail classes are always the most difficult classes to learn and therefore need more weightage or attention. Here, we argue that the assumption might not always hold true. Therefore, we propose a novel approach to dynamically measure the instantaneous difficulty of each class during the training phase of the model. Further, we use the difficulty measures of each class to design a novel weighted loss technique called `class-wise difficulty based weighted (CDB-W) loss' and a novel data sampling technique called `class-wise difficulty based sampling (CDB-S)'. To verify the wide-scale usability of our CDB methods, we conducted extensive experiments on multiple tasks such as image classification, object detection, instance segmentation and video-action classification. Results verified that CDB-W loss and CDB-S could achieve state-of-the-art results on many class-imbalanced datasets such as ImageNet-LT, LVIS and EGTEA, that resemble real-world use cases.

下载PDF全文

下载文献需遵守相关版权规定

论文标题