论文标题
设置增强的三胞胎损失,以重新识别视频人
Set Augmented Triplet Loss for Video Person Re-Identification
论文作者
论文摘要
现代视频人员重新识别(RE-ID)机器通常是通过三胞胎损失监督的度量学习方法训练的。视频RE-ID中使用的三重态损失通常基于所谓的剪辑功能,每个剪辑功能都来自一些框架功能。在本文中,我们建议将视频剪辑建模为一组,而是研究相应三胞胎损失中的距离之间的距离。与剪辑表示之间的距离相反,剪辑集之间的距离考虑了两个集合之间的每个元素(即帧表示)的成对相似性。这使网络可以直接在帧级别上优化功能表示。除了常用的设置距离指标(例如,普通距离和Hausdorff距离)外,我们进一步提出了一个混合距离指标,该指标是针对设定的三胞胎损失量身定制的。此外,我们建议使用批处理中学的类原型制定坚硬的正面构建策略。我们提出的方法在几个标准基准中实现了最新的结果,这证明了该方法的优势。
Modern video person re-identification (re-ID) machines are often trained using a metric learning approach, supervised by a triplet loss. The triplet loss used in video re-ID is usually based on so-called clip features, each aggregated from a few frame features. In this paper, we propose to model the video clip as a set and instead study the distance between sets in the corresponding triplet loss. In contrast to the distance between clip representations, the distance between clip sets considers the pair-wise similarity of each element (i.e., frame representation) between two sets. This allows the network to directly optimize the feature representation at a frame level. Apart from the commonly-used set distance metrics (e.g., ordinary distance and Hausdorff distance), we further propose a hybrid distance metric, tailored for the set-aware triplet loss. Also, we propose a hard positive set construction strategy using the learned class prototypes in a batch. Our proposed method achieves state-of-the-art results across several standard benchmarks, demonstrating the advantages of the proposed method.