PV-NAS：实用的神经体系结构搜索视频识别

论文标题

PV-NAS：实用的神经体系结构搜索视频识别

PV-NAS: Practical Neural Architecture Search for Video Recognition

论文作者

Wang, Zihao, Lin, Chen, Sheng, Lu, Yan, Junjie, Shao, Jing

论文摘要

最近，由于其突出的表示能力，深度学习已被用来解决视频识别问题。视频任务的深神经网络是高度定制的，此类网络的设计需要域专家以及昂贵的试用和错误测试。网络体系结构搜索的最新进步使图像识别性能的幅度很大。但是，探索视频识别网络的自动设计较少。在这项研究中，我们提出了一个实用的解决方案，即实用的视频神经体系结构搜索（PV-NAS）。我们的PV-NAS可以使用基于梯度的搜索方法在新型的空间 - 瞬时网络搜索空间中有效地搜索大量的大规模架构。为了避免遵守亚最佳解决方案，我们提出了一种新颖的学习率调度程序，以鼓励搜索模型的足够网络多样性。广泛的经验评估表明，拟议的PV-NAS具有更少的计算资源来实现最先进的性能。 1) Within light-weight models, our PV-NAS-L achieves 78.7% and 62.5% Top-1 accuracy on Kinetics-400 and Something-Something V2, which are better than previous state-of-the-art methods (i.e., TSM) with a large margin (4.6% and 3.4% on each dataset, respectively), and 2) among median-weight models, our PV-NAS-M achieves the best performance (also a新记录）在某些v2数据集中。

Recently, deep learning has been utilized to solve video recognition problem due to its prominent representation ability. Deep neural networks for video tasks is highly customized and the design of such networks requires domain experts and costly trial and error tests. Recent advance in network architecture search has boosted the image recognition performance in a large margin. However, automatic designing of video recognition network is less explored. In this study, we propose a practical solution, namely Practical Video Neural Architecture Search (PV-NAS).Our PV-NAS can efficiently search across tremendous large scale of architectures in a novel spatial-temporal network search space using the gradient based search methods. To avoid sticking into sub-optimal solutions, we propose a novel learning rate scheduler to encourage sufficient network diversity of the searched models. Extensive empirical evaluations show that the proposed PV-NAS achieves state-of-the-art performance with much fewer computational resources. 1) Within light-weight models, our PV-NAS-L achieves 78.7% and 62.5% Top-1 accuracy on Kinetics-400 and Something-Something V2, which are better than previous state-of-the-art methods (i.e., TSM) with a large margin (4.6% and 3.4% on each dataset, respectively), and 2) among median-weight models, our PV-NAS-M achieves the best performance (also a new record)in the Something-Something V2 dataset.

下载PDF全文

下载文献需遵守相关版权规定

论文标题