CFAD：用于时空动作定位的粗到细节探测器

论文标题

CFAD：用于时空动作定位的粗到细节探测器

CFAD: Coarse-to-Fine Action Detector for Spatiotemporal Action Localization

论文作者

Li, Yuxi, Lin, Weiyao, See, John, Xu, Ning, Xu, Shugong, Yan, Ke, Yang, Cong

论文摘要

大多数用于时空动作定位的当前管道连接框架或剪辑检测结果以生成动作提案，其中仅利用局部信息，并且效率受到密集的人均定位的阻碍。在本文中，我们提出了粗到十五的动作检测器（CFAD），这是一种原始的端到端训练框架，用于有效时空动作定位。 CFAD引入了一个新的范式，该范式首先估算视频流的粗时空动作管，然后根据关键时间戳来改进管的位置。这个概念由两个关键组件（我们框架中的粗糙和精炼模块）实现。粗大模块中长时信息的参数化建模有助于获得准确的初始管估计，而精炼模块在关键时间戳的指导下选择性地调整了管位置。在其他方法中，Proped CFAD在UCF101-24，UCFSPORTS和JHMDB-21的动作检测基准上取得了竞争成果，其推理速度比最近的竞争对手快3.3倍。

Most current pipelines for spatio-temporal action localization connect frame-wise or clip-wise detection results to generate action proposals, where only local information is exploited and the efficiency is hindered by dense per-frame localization. In this paper, we propose Coarse-to-Fine Action Detector (CFAD),an original end-to-end trainable framework for efficient spatio-temporal action localization. The CFAD introduces a new paradigm that first estimates coarse spatio-temporal action tubes from video streams, and then refines the tubes' location based on key timestamps. This concept is implemented by two key components, the Coarse and Refine Modules in our framework. The parameterized modeling of long temporal information in the Coarse Module helps obtain accurate initial tube estimation, while the Refine Module selectively adjusts the tube location under the guidance of key timestamps. Against other methods, theproposed CFAD achieves competitive results on action detection benchmarks of UCF101-24, UCFSports and JHMDB-21 with inference speed that is 3.3x faster than the nearest competitors.

下载PDF全文

下载文献需遵守相关版权规定

论文标题