re^2tal：重新布线预审预开的视频主机，以进行可逆的时间动作本地化

论文标题

re^2tal：重新布线预审预开的视频主机，以进行可逆的时间动作本地化

Re^2TAL: Rewiring Pretrained Video Backbones for Reversible Temporal Action Localization

论文作者

Zhao, Chen, Liu, Shuming, Mangalam, Karttikeya, Ghanem, Bernard

论文摘要

时间动作定位（TAL）需要长期推理来预测各种持续时间和复杂内容的动作。鉴于GPU的记忆力有限，在长视频中端到头的训练端到头（即，从视频到预测）是一个重大挑战。大多数方法只能在未针对本地化问题优化的情况下训练预提取功能，从而限制了本地化性能。在这项工作中，为了扩展TAL网络中的潜力，我们提出了一种新颖的端到端方法RE2TAL，该方法重新定义了视频式骨架以进行可逆TAL。 RE2TAL构建具有可逆模块的骨干，可以从输出中恢复输入，从而可以在训练过程中从内存中清除笨重的中间激活。我们没有设计一种类型的可逆模块，而是提出了一个网络重新布置机制，以将任何模块与可逆模块的残留连接转换为无需更改任何参数的任何模块。这提供了两个好处：（1）从现有甚至将来的模型设计中很容易获得各种可逆网络，并且（2）可逆模型需要更少的培训工作，因为它们重新使用原始非可逆版本的预训练参数。 RE2TAL仅使用RGB模式，在ActivityNet-V1.3，新的最新记录中达到平均地图为37.01％，在Thumos-14上的TIOU = 0.5的MAP为64.9％，超过了所有其他RGB的方法。

Temporal action localization (TAL) requires long-form reasoning to predict actions of various durations and complex content. Given limited GPU memory, training TAL end to end (i.e., from videos to predictions) on long videos is a significant challenge. Most methods can only train on pre-extracted features without optimizing them for the localization problem, consequently limiting localization performance. In this work, to extend the potential in TAL networks, we propose a novel end-to-end method Re2TAL, which rewires pretrained video backbones for reversible TAL. Re2TAL builds a backbone with reversible modules, where the input can be recovered from the output such that the bulky intermediate activations can be cleared from memory during training. Instead of designing one single type of reversible module, we propose a network rewiring mechanism, to transform any module with a residual connection to a reversible module without changing any parameters. This provides two benefits: (1) a large variety of reversible networks are easily obtained from existing and even future model designs, and (2) the reversible models require much less training effort as they reuse the pre-trained parameters of their original non-reversible versions. Re2TAL, only using the RGB modality, reaches 37.01% average mAP on ActivityNet-v1.3, a new state-of-the-art record, and mAP 64.9% at tIoU=0.5 on THUMOS-14, outperforming all other RGB-only methods.

下载PDF全文

下载文献需遵守相关版权规定

论文标题