设置增强的三胞胎损失，以重新识别视频人

论文标题

设置增强的三胞胎损失，以重新识别视频人

Set Augmented Triplet Loss for Video Person Re-Identification

论文作者

Fang, Pengfei, Ji, Pan, Petersson, Lars, Harandi, Mehrtash

论文摘要

现代视频人员重新识别（RE-ID）机器通常是通过三胞胎损失监督的度量学习方法训练的。视频RE-ID中使用的三重态损失通常基于所谓的剪辑功能，每个剪辑功能都来自一些框架功能。在本文中，我们建议将视频剪辑建模为一组，而是研究相应三胞胎损失中的距离之间的距离。与剪辑表示之间的距离相反，剪辑集之间的距离考虑了两个集合之间的每个元素（即帧表示）的成对相似性。这使网络可以直接在帧级别上优化功能表示。除了常用的设置距离指标（例如，普通距离和Hausdorff距离）外，我们进一步提出了一个混合距离指标，该指标是针对设定的三胞胎损失量身定制的。此外，我们建议使用批处理中学的类原型制定坚硬的正面构建策略。我们提出的方法在几个标准基准中实现了最新的结果，这证明了该方法的优势。

Modern video person re-identification (re-ID) machines are often trained using a metric learning approach, supervised by a triplet loss. The triplet loss used in video re-ID is usually based on so-called clip features, each aggregated from a few frame features. In this paper, we propose to model the video clip as a set and instead study the distance between sets in the corresponding triplet loss. In contrast to the distance between clip representations, the distance between clip sets considers the pair-wise similarity of each element (i.e., frame representation) between two sets. This allows the network to directly optimize the feature representation at a frame level. Apart from the commonly-used set distance metrics (e.g., ordinary distance and Hausdorff distance), we further propose a hybrid distance metric, tailored for the set-aware triplet loss. Also, we propose a hard positive set construction strategy using the learned class prototypes in a batch. Our proposed method achieves state-of-the-art results across several standard benchmarks, demonstrating the advantages of the proposed method.

下载PDF全文

下载文献需遵守相关版权规定

论文标题