TADML：用力学MLP快速的时间动作检测

论文标题

TADML：用力学MLP快速的时间动作检测

TadML: A fast temporal action detection with Mechanics-MLP

论文作者

Deng, Bowen, Liu, Dongchang

论文摘要

时间动作检测（TAD）是视频理解中的一项至关重要但具有挑战性的任务。它旨在在长期未修剪的视频中检测每个操作实例的类型和起始框架。最新的模型大多数模型都采用RGB和光流流对TAD任务采用。因此，必须将原始RGB框架手动转换为具有其他计算和时间成本的光流帧，这是实现实时处理的障碍。目前，许多模型采用了两阶段的策略，这些策略会降低推理的速度，并在提案生成的提案上进行调整。通过比较，我们仅与RGB流有关一种单阶段的无锚时间定位方法，其中建立了新颖的牛顿力学MLP体系结构。它具有与所有现有最新模型相当的精度，而超过这些方法的推理速度的准确性很大。本文的典型推理速度在Thumos14上每秒惊人的4.44视频。在应用程序中，由于无需转换光流，推理速度将更快。它还证明MLP在下游任务（例如TAD）中具有很大的潜力。源代码可从https://github.com/boneddeng/tadml获得

Temporal Action Detection(TAD) is a crucial but challenging task in video understanding.It is aimed at detecting both the type and start-end frame for each action instance in a long, untrimmed video.Most current models adopt both RGB and Optical-Flow streams for the TAD task. Thus, original RGB frames must be converted manually into Optical-Flow frames with additional computation and time cost, which is an obstacle to achieve real-time processing. At present, many models adopt two-stage strategies, which would slow the inference speed down and complicatedly tuning on proposals generating.By comparison, we propose a one-stage anchor-free temporal localization method with RGB stream only, in which a novel Newtonian Mechanics-MLP architecture is established. It has comparable accuracy with all existing state-of-the-art models, while surpasses the inference speed of these methods by a large margin. The typical inference speed in this paper is astounding 4.44 video per second on THUMOS14. In applications, because there is no need to convert optical flow, the inference speed will be faster.It also proves that MLP has great potential in downstream tasks such as TAD. The source code is available at https://github.com/BonedDeng/TadML

下载PDF全文

下载文献需遵守相关版权规定

论文标题