论文标题
可乐:局部对比度学习以进行健壮的关键点检测
CoKe: Localized Contrastive Learning for Robust Keypoint Detection
论文作者
论文摘要
在本文中,我们引入了一个对比度学习框架,以进行关键点检测(可乐)。关键点检测与应用对比学习的其他视觉任务不同,因为输入是一组图像,其中多个关键点被注释。这需要扩展对比度学习,以使关键点被独立表示和检测,这使对比度损失使关键点特征彼此不同,并且与背景不同。我们的方法有两个好处:它使我们能够利用对比度学习以进行关键点检测,并且通过独立检测每个关键点,与整体方法(例如堆叠的沙漏网络)相比,检测变得更加可靠,例如试图检测所有关键点。我们的可乐框架引入了几项技术创新。特别是,我们介绍:(i)一个杂乱的银行来表示非2heypoint功能; (ii)一个存储关键点的原型表示的关键点库,以近似关键点之间的对比损失; (iii)累积移动平均更新,以在训练功能提取器时学习关键点原型。我们对各种数据集(Pascal3d+,MPII,ObjectNet3D)进行的实验表明,我们的方法也起作用,或者比关键点检测的替代方法,即使对于人类关键点,文献很广泛。此外,我们观察到可口可乐对部分闭塞和以前看不见的对象构成非常强大。
In this paper, we introduce a contrastive learning framework for keypoint detection (CoKe). Keypoint detection differs from other visual tasks where contrastive learning has been applied because the input is a set of images in which multiple keypoints are annotated. This requires the contrastive learning to be extended such that the keypoints are represented and detected independently, which enables the contrastive loss to make the keypoint features different from each other and from the background. Our approach has two benefits: It enables us to exploit contrastive learning for keypoint detection, and by detecting each keypoint independently the detection becomes more robust to occlusion compared to holistic methods, such as stacked hourglass networks, which attempt to detect all keypoints jointly. Our CoKe framework introduces several technical innovations. In particular, we introduce: (i) A clutter bank to represent non-keypoint features; (ii) a keypoint bank that stores prototypical representations of keypoints to approximate the contrastive loss between keypoints; and (iii) a cumulative moving average update to learn the keypoint prototypes while training the feature extractor. Our experiments on a range of diverse datasets (PASCAL3D+, MPII, ObjectNet3D) show that our approach works as well, or better than, alternative methods for keypoint detection, even for human keypoints, for which the literature is vast. Moreover, we observe that CoKe is exceptionally robust to partial occlusion and previously unseen object poses.