多拍的时间事件本地化：基准测试

论文标题

多拍的时间事件本地化：基准测试

Multi-shot Temporal Event Localization: a Benchmark

论文作者

Liu, Xiaolong, Hu, Yao, Bai, Song, Ding, Fei, Bai, Xiang, Torr, Philip H. S.

论文摘要

时间事件或动作本地化的当前发展通常是由单个相机捕获的动作。但是，在不同位置的多个摄像头可以将野外广泛的事件或动作作为一系列镜头捕获。在本文中，我们提出了一项新的挑战性的任务，称为多弹性时间事件本地化，因此，收集了一个称为多拍事件（MUSES）的大型数据集。缪斯女神有31,477个事件实例，总共有716个视频小时。缪斯的核心性质是频繁的射击，每个实例平均拍摄19张，每次视频176张照片，从而引起大型内部施加变化。我们的全面评估表明，时间动作定位中的最新方法仅在IOU = 0.5时达到13.1％的地图。作为较小的贡献，我们提出了一种简单的基线方法，用于处理内部内部变化，该方法报告的缪斯女神的地图为18.9％，而thumos14的地图为56.9％，= 0.5。为了促进这个方向的研究，我们在https://songbai.site/muses/上发布数据集和项目代码。

Current developments in temporal event or action localization usually target actions captured by a single camera. However, extensive events or actions in the wild may be captured as a sequence of shots by multiple cameras at different positions. In this paper, we propose a new and challenging task called multi-shot temporal event localization, and accordingly, collect a large scale dataset called MUlti-Shot EventS (MUSES). MUSES has 31,477 event instances for a total of 716 video hours. The core nature of MUSES is the frequent shot cuts, for an average of 19 shots per instance and 176 shots per video, which induces large intrainstance variations. Our comprehensive evaluations show that the state-of-the-art method in temporal action localization only achieves an mAP of 13.1% at IoU=0.5. As a minor contribution, we present a simple baseline approach for handling the intra-instance variations, which reports an mAP of 18.9% on MUSES and 56.9% on THUMOS14 at IoU=0.5. To facilitate research in this direction, we release the dataset and the project code at https://songbai.site/muses/ .

下载PDF全文

下载文献需遵守相关版权规定

论文标题