论文标题
双向功能预测的无监督视频表示学习
Unsupervised Video Representation Learning by Bidirectional Feature Prediction
论文作者
论文摘要
本文介绍了一种通过功能预测进行自我监督视频表示学习的新方法。与以前侧重于未来功能预测的方法相反,我们认为是由未观察到的过去帧引起的监督信号与源自未来帧的一种互补信号是互补的。我们方法背后的理由是鼓励网络通过区分未来和过去给定的观察结果来探索视频的时间结构。我们在对比度学习框架中训练模型,在这种框架中,对未来和过去的联合编码通过交换为我们提供了一套全面的时间艰苦负面影响。我们从经验上表明,利用这两个信号都丰富了学习识别的下游任务的学会表示。它的表现优于对未来和过去的独立预测。
This paper introduces a novel method for self-supervised video representation learning via feature prediction. In contrast to the previous methods that focus on future feature prediction, we argue that a supervisory signal arising from unobserved past frames is complementary to one that originates from the future frames. The rationale behind our method is to encourage the network to explore the temporal structure of videos by distinguishing between future and past given present observations. We train our model in a contrastive learning framework, where joint encoding of future and past provides us with a comprehensive set of temporal hard negatives via swapping. We empirically show that utilizing both signals enriches the learned representations for the downstream task of action recognition. It outperforms independent prediction of future and past.