通过序列对比度学习的长视频的框架动作表示

论文标题

通过序列对比度学习的长视频的框架动作表示

Frame-wise Action Representations for Long Videos via Sequence Contrastive Learning

论文作者

Chen, Minghao, Wei, Fangyun, Li, Chong, Cai, Deng

论文摘要

关于动作表示的先前工作主要集中于设计各种架构，以提取短视频剪辑的全局表示形式。相比之下，许多实际应用（例如视频对齐方式）对长时间的学习密集表示有很大的需求。在本文中，我们介绍了一种新颖的对比动作表示学习（CARL）框架，以学习框架的动作表示形式，尤其是对于长期视频，以一种自我监督的方式。具体而言，我们引入了一个简单而有效的视频编码器，该编码器认为时空上下文以提取框架的表示。受到自我监督学习的最新进展的启发，我们提出了一种新颖的序列对比损失（SCL），该损失（SCL）对通过一系列时空数据增强获得的两个相关观点应用。 SCL通过最大程度地减少两个增强视图的序列相似性与时间戳距离的先前高斯分布之间的KL差异来优化嵌入空间。关于finegym，pennaction和Pouping数据集的实验表明，我们的方法的表现优于先前的最先进的边距，用于下游细粒度的动作分类。令人惊讶的是，尽管没有对配对视频进行培训，但我们的方法还显示了视频对齐和细粒框架检索任务的出色表现。代码和型号可在https://github.com/minghchen/carl_code上找到。

Prior works on action representation learning mainly focus on designing various architectures to extract the global representations for short video clips. In contrast, many practical applications such as video alignment have strong demand for learning dense representations for long videos. In this paper, we introduce a novel contrastive action representation learning (CARL) framework to learn frame-wise action representations, especially for long videos, in a self-supervised manner. Concretely, we introduce a simple yet efficient video encoder that considers spatio-temporal context to extract frame-wise representations. Inspired by the recent progress of self-supervised learning, we present a novel sequence contrastive loss (SCL) applied on two correlated views obtained through a series of spatio-temporal data augmentations. SCL optimizes the embedding space by minimizing the KL-divergence between the sequence similarity of two augmented views and a prior Gaussian distribution of timestamp distance. Experiments on FineGym, PennAction and Pouring datasets show that our method outperforms previous state-of-the-art by a large margin for downstream fine-grained action classification. Surprisingly, although without training on paired videos, our approach also shows outstanding performance on video alignment and fine-grained frame retrieval tasks. Code and models are available at https://github.com/minghchen/CARL_code.

下载PDF全文

下载文献需遵守相关版权规定

论文标题