论文标题
自下而上的时间动作与相互正则化
Bottom-Up Temporal Action Localization with Mutual Regularization
论文作者
论文摘要
最近,时间动作本地化(TAL),即在未修剪的视频中找到特定的动作片段,引起了计算机视觉社区的越来越关注。 TAL的最先进解决方案涉及评估三个指示阶段的框架级别概率,即开始,持续和结束;然后对最终本地化的这些预测进行后处理。本文深入研究了这种机制,并认为通过将这些阶段建模为单个分类任务,现有方法忽略了它们之间的潜在时间约束。当视频输入的某些帧缺乏足够的歧视性信息时,这可能会导致不正确和/或不一致的预测。为了减轻这个问题,我们介绍了两个正则化术语以将学习程序互正规化:提出了相期的一致性(Intrac)正则化以对每个阶段进行验证的预测;并提出了相之间的一致性(Interc)正则化以保持这些阶段之间的一致性。整个框架共同优化这两个术语,在端到端优化过程中意识到了这些潜在的约束。实验是在两个流行的TAL数据集Thumos14和ActivityNet1.3上进行的。我们的方法显然优于基线,既定量和定性。提出的正则化还推广到其他TAL方法(例如TSA-NET和PGCN)。代码:https://github.com/peisenzhao/bottom-up-tal-with-mr
Recently, temporal action localization (TAL), i.e., finding specific action segments in untrimmed videos, has attracted increasing attentions of the computer vision community. State-of-the-art solutions for TAL involves evaluating the frame-level probabilities of three action-indicating phases, i.e. starting, continuing, and ending; and then post-processing these predictions for the final localization. This paper delves deep into this mechanism, and argues that existing methods, by modeling these phases as individual classification tasks, ignored the potential temporal constraints between them. This can lead to incorrect and/or inconsistent predictions when some frames of the video input lack sufficient discriminative information. To alleviate this problem, we introduce two regularization terms to mutually regularize the learning procedure: the Intra-phase Consistency (IntraC) regularization is proposed to make the predictions verified inside each phase; and the Inter-phase Consistency (InterC) regularization is proposed to keep consistency between these phases. Jointly optimizing these two terms, the entire framework is aware of these potential constraints during an end-to-end optimization process. Experiments are performed on two popular TAL datasets, THUMOS14 and ActivityNet1.3. Our approach clearly outperforms the baseline both quantitatively and qualitatively. The proposed regularization also generalizes to other TAL methods (e.g., TSA-Net and PGCN). code: https://github.com/PeisenZhao/Bottom-Up-TAL-with-MR