论文标题
CBR-NET:级联边界改进网络以进行操作检测:提交活动网络挑战2020(任务1)
CBR-Net: Cascade Boundary Refinement Network for Action Detection: Submission to ActivityNet Challenge 2020 (Task 1)
论文作者
论文摘要
在本报告中,我们为2020年活动网络挑战中的时间动作定位任务(检测)(任务1)提出了解决方案。该任务的目的是在长期未修剪的视频中进行感兴趣的行动的时间内定位间隔,并预测动作类别。我们的解决方案主要包括三个组件:1)特征编码:我们应用三种骨干,包括TSN [7],slowfast [3]和i3d [1],它们都在动力学数据集[2]上预处理。应用这些模型,我们可以提取摘要级视频表示; 2)提案生成:我们选择BMN [5]作为基准,我们设计了一个级联边界改进网络(CBR-NET)来进行建议检测。 CBR-NET主要包含两个模块:时间特征编码,它应用Bilstm编码长期时间信息; CBR模块,其目标是在不同的参数设置下完善建议精度; 3)动作定位:在此阶段,我们结合了通过微调网络获得的视频级分类结果,以预测每个建议的类别。此外,我们还适用于不同的集合策略,以提高设计解决方案的性能,通过平均平均精度指标,我们在ActivityNet V1.3数据集的测试集中获得42.788%。
In this report, we present our solution for the task of temporal action localization (detection) (task 1) in ActivityNet Challenge 2020. The purpose of this task is to temporally localize intervals where actions of interest occur and predict the action categories in a long untrimmed video. Our solution mainly includes three components: 1) feature encoding: we apply three kinds of backbones, including TSN [7], Slowfast[3] and I3d[1], which are both pretrained on Kinetics dataset[2]. Applying these models, we can extract snippet-level video representations; 2) proposal generation: we choose BMN [5] as our baseline, base on which we design a Cascade Boundary Refinement Network (CBR-Net) to conduct proposal detection. The CBR-Net mainly contains two modules: temporal feature encoding, which applies BiLSTM to encode long-term temporal information; CBR module, which targets to refine the proposal precision under different parameter settings; 3) action localization: In this stage, we combine the video-level classification results obtained by the fine tuning networks to predict the category of each proposal. Moreover, we also apply to different ensemble strategies to improve the performance of the designed solution, by which we achieve 42.788% on the testing set of ActivityNet v1.3 dataset in terms of mean Average Precision metrics.