任务自适应的空间时间视频采样器，用于几次动作识别

论文标题

任务自适应的空间时间视频采样器，用于几次动作识别

Task-adaptive Spatial-Temporal Video Sampler for Few-shot Action Recognition

论文作者

Liu, Huabin, Lv, Weixian, See, John, Lin, Weiyao

论文摘要

几次动作识别中面临的主要挑战是视频数据不足以进行培训。为了解决此问题，该领域中的当前方法主要集中于在功能级别上设计算法，而对处理输入视频数据的关注很少。此外，现有的框架采样策略可能会省略时间和空间维度的关键行动信息，这进一步影响了视频利用效率。在本文中，我们提出了一个新颖的视频框架采样器，以进行几个弹药的识别来解决此问题，其中特定于任务的时空框架采样是通过时间选择器（TS）和空间放大器（SA）实现的。具体而言，我们的采样器首先以较小的计算成本扫描整个视频，以获得对视频帧的全球感知。 TS在选择最显着，随后的贡献的顶级框架方面发挥了作用。 SA通过在显着图的指导下扩大关键区域来强调每个框架的歧视性信息。我们进一步采用任务自适应学习，根据手头的情节任务动态调整采样策略。 TS和SA的实现都可以通过端到端优化进行差异，从而通过大多数少数发动的动作识别方法促进了我们所提出的采样器的无缝集成。广泛的实验表明，在包括长期视频在内的各种基准测试中的表演都有显着提高。

A primary challenge faced in few-shot action recognition is inadequate video data for training. To address this issue, current methods in this field mainly focus on devising algorithms at the feature level while little attention is paid to processing input video data. Moreover, existing frame sampling strategies may omit critical action information in temporal and spatial dimensions, which further impacts video utilization efficiency. In this paper, we propose a novel video frame sampler for few-shot action recognition to address this issue, where task-specific spatial-temporal frame sampling is achieved via a temporal selector (TS) and a spatial amplifier (SA). Specifically, our sampler first scans the whole video at a small computational cost to obtain a global perception of video frames. The TS plays its role in selecting top-T frames that contribute most significantly and subsequently. The SA emphasizes the discriminative information of each frame by amplifying critical regions with the guidance of saliency maps. We further adopt task-adaptive learning to dynamically adjust the sampling strategy according to the episode task at hand. Both the implementations of TS and SA are differentiable for end-to-end optimization, facilitating seamless integration of our proposed sampler with most few-shot action recognition methods. Extensive experiments show a significant boost in the performances on various benchmarks including long-term videos.The code is available at https://github.com/R00Kie-Liu/Sampler

下载PDF全文

下载文献需遵守相关版权规定

论文标题