论文标题
与周期协会的视频可识别的可推广重新识别
Generalizable Re-Identification from Videos with Cycle Association
论文作者
论文摘要
在本文中,我们有兴趣从未标记的视频中学习一个可推广的人重新识别(RE-ID)表示。与1)相比,训练和测试集通常在同一领域下的流行无监督的重新ID设置,以及2)流行的域概括(DG)RE-ID设置,其中训练样品标记了训练样品,我们的小说场景结合了他们的关键挑战:训练样本不受欢迎,并收集了各种与测试域相处的各种领域。换句话说,我们旨在以无监督的方式学习代表,并直接将学习的表示形式用于新的领域中的重新ID。为了实现这一目标,我们做出了两个主要贡献:首先,我们提出了循环协会(CYCAS),这是一种可扩展的自我监督学习方法,用于重新培训,训练较低的复杂性;其次,我们构建了一个名为LMP-VIDEO的大规模的无标记的重新ID数据集,该数据集是为该方法量身定制的。具体而言,Cycas通过实例连续的视频框架对之间的实例关联来学习RE-ID功能,而训练成本仅是线性的,使数据大小是线性的,从而使大规模培训成为可能。另一方面,LMP-VIDEO数据集非常大,包含5000万个未标记的人图像,从10k YouTube视频中裁剪出来,因此足以充当肥沃的土壤,以进行自我监督的学习。经过在LMP-VIDEO的培训,我们表明Cycas对新领域学习了良好的概括。实现的结果有时甚至超过了可监督的域名模型。值得注意的是,CYCA在Market-1501中获得82.2%的排名1,而MSMT17的排名为49.0%,人类注释为零,超过了最先进的监督DG RE-ID方法。此外,我们还展示了在规范无监督的重新ID和预处理和最重要的情况下的优势。
In this paper, we are interested in learning a generalizable person re-identification (re-ID) representation from unlabeled videos. Compared with 1) the popular unsupervised re-ID setting where the training and test sets are typically under the same domain, and 2) the popular domain generalization (DG) re-ID setting where the training samples are labeled, our novel scenario combines their key challenges: the training samples are unlabeled, and collected form various domains which do no align with the test domain. In other words, we aim to learn a representation in an unsupervised manner and directly use the learned representation for re-ID in novel domains. To fulfill this goal, we make two main contributions: First, we propose Cycle Association (CycAs), a scalable self-supervised learning method for re-ID with low training complexity; and second, we construct a large-scale unlabeled re-ID dataset named LMP-video, tailored for the proposed method. Specifically, CycAs learns re-ID features by enforcing cycle consistency of instance association between temporally successive video frame pairs, and the training cost is merely linear to the data size, making large-scale training possible. On the other hand, the LMP-video dataset is extremely large, containing 50 million unlabeled person images cropped from over 10K Youtube videos, therefore is sufficient to serve as fertile soil for self-supervised learning. Trained on LMP-video, we show that CycAs learns good generalization towards novel domains. The achieved results sometimes even outperform supervised domain generalizable models. Remarkably, CycAs achieves 82.2% Rank-1 on Market-1501 and 49.0% Rank-1 on MSMT17 with zero human annotation, surpassing state-of-the-art supervised DG re-ID methods. Moreover, we also demonstrate the superiority of CycAs under the canonical unsupervised re-ID and the pretrain-and-finetune scenarios.