点级的时间动作定位：将完全监督的提案桥接到弱监督的损失

论文标题

点级的时间动作定位：将完全监督的提案桥接到弱监督的损失

Point-Level Temporal Action Localization: Bridging Fully-supervised Proposals to Weakly-supervised Losses

论文作者

Ju, Chen, Zhao, Peisen, Zhang, Ya, Wang, Yanfeng, Tian, Qi

论文摘要

点级的时间动作本地化（PTAL）旨在将动作定位在未修剪的视频中，每个操作实例只有一个时间戳注释。现有方法采用框架级预测范式从稀疏的单帧标签中学习。但是，这样的框架不可避免地会遭受较大的解决方案空间。本文试图探索基于提案的预测范式，以示点级注释，该范式具有更受限制的解决方案空间和相邻框架之间一致的预测的优势。点级注释首先用作训练关键点检测器的关键点监督。在位置预测阶段，一个简单但有效的映射模块，可以使训练错误进行后传播，然后引入以较弱的监督为桥接全面监督的框架。据我们所知，这是第一项利用全面监督范式进行点级设置的工作。在Thumos14，Beoid和GTEA上进行的实验验证了我们提出的方法的有效性，既有定量和定性，又证明我们的方法表现优于最先进的方法。

Point-Level temporal action localization (PTAL) aims to localize actions in untrimmed videos with only one timestamp annotation for each action instance. Existing methods adopt the frame-level prediction paradigm to learn from the sparse single-frame labels. However, such a framework inevitably suffers from a large solution space. This paper attempts to explore the proposal-based prediction paradigm for point-level annotations, which has the advantage of more constrained solution space and consistent predictions among neighboring frames. The point-level annotations are first used as the keypoint supervision to train a keypoint detector. At the location prediction stage, a simple but effective mapper module, which enables back-propagation of training errors, is then introduced to bridge the fully-supervised framework with weak supervision. To our best of knowledge, this is the first work to leverage the fully-supervised paradigm for the point-level setting. Experiments on THUMOS14, BEOID, and GTEA verify the effectiveness of our proposed method both quantitatively and qualitatively, and demonstrate that our method outperforms state-of-the-art methods.

下载PDF全文

下载文献需遵守相关版权规定

论文标题