Bastictad：临时动作检测的惊人的仅RGB基线

论文标题

Bastictad：临时动作检测的惊人的仅RGB基线

BasicTAD: an Astounding RGB-Only Baseline for Temporal Action Detection

论文作者

Yang, Min, Chen, Guo, Zheng, Yin-Dong, Lu, Tong, Wang, Limin

论文摘要

在视频理解社区中，通过通常遵循图像中的对象检测管道，对时间动作检测（TAD）进行了广泛的研究。但是，TAD中的复杂设计并不少见，例如两流特征提取，多阶段训练，复杂的时间建模和全球环境融合。在本文中，我们不打算为TAD引入任何新颖的技术。取而代之的是，鉴于TAD中复杂设计的当前状态和低检测效率，我们研究了一个简单，直截了当但必须已知的基线。在我们简单的基线（称为Bastictad）中，我们将TAD管道分解为几个基本组件：数据采样，骨干设计，颈部结构和检测头。我们广泛研究了该基准的每个组件中的现有技术，更重要的是，由于设计的简单性，在整个管道上执行端到端培训。结果，这个简单的基本座产生了一个惊人的实时RGB基线，非常接近具有两流输入的最新方法。此外，我们通过在网络表示中保留更多的时间和空间信息（称为plustad），进一步改善了基数。经验结果表明，我们的plustad非常有效，并且在Thumos14和Fineaction的数据集上明显优于先前的方法。同时，我们还对我们提出的方法执行了深入的可视化和错误分析，并尝试提供有关TAD问题的更多见解。我们的方法可以作为未来TAD研究的强大基准。该代码和模型将在https://github.com/mcg-nju/basictad上发布。

Temporal action detection (TAD) is extensively studied in the video understanding community by generally following the object detection pipeline in images. However, complex designs are not uncommon in TAD, such as two-stream feature extraction, multi-stage training, complex temporal modeling, and global context fusion. In this paper, we do not aim to introduce any novel technique for TAD. Instead, we study a simple, straightforward, yet must-known baseline given the current status of complex design and low detection efficiency in TAD. In our simple baseline (termed BasicTAD), we decompose the TAD pipeline into several essential components: data sampling, backbone design, neck construction, and detection head. We extensively investigate the existing techniques in each component for this baseline, and more importantly, perform end-to-end training over the entire pipeline thanks to the simplicity of design. As a result, this simple BasicTAD yields an astounding and real-time RGB-Only baseline very close to the state-of-the-art methods with two-stream inputs. In addition, we further improve the BasicTAD by preserving more temporal and spatial information in network representation (termed as PlusTAD). Empirical results demonstrate that our PlusTAD is very efficient and significantly outperforms the previous methods on the datasets of THUMOS14 and FineAction. Meanwhile, we also perform in-depth visualization and error analysis on our proposed method and try to provide more insights on the TAD problem. Our approach can serve as a strong baseline for future TAD research. The code and model will be released at https://github.com/MCG-NJU/BasicTAD.

下载PDF全文

下载文献需遵守相关版权规定

论文标题