TTPP：具有渐进预测的时间变压器，以进行有效的行动预期

论文标题

TTPP：具有渐进预测的时间变压器，以进行有效的行动预期

TTPP: Temporal Transformer with Progressive Prediction for Efficient Action Anticipation

论文作者

Wang, Wen, Peng, Xiaojiang, Su, Yanzhou, Qiao, Yu, Cheng, Jian

论文摘要

视频动作预期旨在预测观察到的未来行动类别。当前的最新方法主要诉诸于复发的神经网络，以将历史信息编码为隐藏状态，并从隐藏表示形式中预测未来的动作。众所周知，循环管道在捕获长期信息时效率低下，这可能会限制其在预测任务中的性能。为了解决这个问题，本文提出了一个带有渐进预测（TTPP）框架的简单而有效的颞变压器，该框架重新利用了变压器式的体系结构以汇总观察到的功能，然后利用轻量级网络来逐步预测未来的功能和动作。具体而言，预测功能以及预测的概率被积累到后续预测的输入中。我们评估了三个动作数据集的方法，即电视连续剧，Thumos-14和TV-Human Itteraction。此外，我们还针对几种流行的聚集和预测策略进行了全面研究。广泛的结果表明，TTPP不仅胜过最先进的方法，而且更有效。

Video action anticipation aims to predict future action categories from observed frames. Current state-of-the-art approaches mainly resort to recurrent neural networks to encode history information into hidden states, and predict future actions from the hidden representations. It is well known that the recurrent pipeline is inefficient in capturing long-term information which may limit its performance in predication task. To address this problem, this paper proposes a simple yet efficient Temporal Transformer with Progressive Prediction (TTPP) framework, which repurposes a Transformer-style architecture to aggregate observed features, and then leverages a light-weight network to progressively predict future features and actions. Specifically, predicted features along with predicted probabilities are accumulated into the inputs of subsequent prediction. We evaluate our approach on three action datasets, namely TVSeries, THUMOS-14, and TV-Human-Interaction. Additionally we also conduct a comprehensive study for several popular aggregation and prediction strategies. Extensive results show that TTPP not only outperforms the state-of-the-art methods but also more efficient.

下载PDF全文

下载文献需遵守相关版权规定

论文标题