论文标题
使用额外检测头的Vision Transformer开放式识别
Open Set Recognition using Vision Transformer with an Additional Detection Head
论文作者
论文摘要
深度神经网络已证明在封闭设置中的图像分类任务的突出能力,其中测试数据来自与培训数据相同的分布。但是,在更现实的开放场景中,具有不完整知识的传统分类器无法处理不是培训类别的测试数据。开放集识别(OSR)旨在通过同时识别未知类别和区分已知类别来解决此问题。在本文中,我们提出了一种基于视觉变压器(VIT)技术的新型OSR方法。具体来说,我们的方法采用了两个单独的培训阶段。首先,对VIT模型进行了训练以执行封闭式分类。然后,将一个附加的检测头连接到由VIT提取的嵌入式特征,该特征被训练,以迫使已知数据的表示向特定于类别的群集紧凑。测试示例根据与群集中心的距离确定为已知或未知的示例。据我们所知,这是第一次利用VIT来实现OSR的目的,我们对多个OSR基准数据集的广泛评估表明,我们的方法显着胜过其他基线方法,并获得了新的最新性能。
Deep neural networks have demonstrated prominent capacities for image classification tasks in a closed set setting, where the test data come from the same distribution as the training data. However, in a more realistic open set scenario, traditional classifiers with incomplete knowledge cannot tackle test data that are not from the training classes. Open set recognition (OSR) aims to address this problem by both identifying unknown classes and distinguishing known classes simultaneously. In this paper, we propose a novel approach to OSR that is based on the vision transformer (ViT) technique. Specifically, our approach employs two separate training stages. First, a ViT model is trained to perform closed set classification. Then, an additional detection head is attached to the embedded features extracted by the ViT, trained to force the representations of known data to class-specific clusters compactly. Test examples are identified as known or unknown based on their distance to the cluster centers. To the best of our knowledge, this is the first time to leverage ViT for the purpose of OSR, and our extensive evaluation against several OSR benchmark datasets reveals that our approach significantly outperforms other baseline methods and obtains new state-of-the-art performance.