通过在视频中提取显着特征来进行现实世界的视频异常检测

论文标题

通过在视频中提取显着特征来进行现实世界的视频异常检测

Real-world Video Anomaly Detection by Extracting Salient Features in Videos

论文作者

Watanabe, Yudai, Okabe, Makoto, Harada, Yasunori, Kashima, Naoji

论文摘要

我们提出了一种轻巧，准确的方法，用于检测视频中的异常。现有方法使用多个实体学习（MIL）来确定视频每个段的正常/异常状态。最近的成功研究认为，学习细分市场之间的时间关系很重要，以达到高精度，而不是仅专注于单个细分市场。因此，我们分析了近年来已经成功的现有方法，并发现同时学习所有细分市场确实很重要，但其中的时间顺序与实现高准确性无关。基于这一发现，我们不使用MIL框架，而是提出具有自发机制的轻量级模型，以自动提取对于确定所有输入段正常/异常非常重要的特征。结果，我们的神经网络模型具有现有方法的参数数量的1.3％。我们在三个基准数据集（UCF-Crime，Shanghaitech和XD-Violence）上评估了方法的帧级检测准确性，并证明我们的方法可以比最新方法实现可比或更好的准确性。

We propose a lightweight and accurate method for detecting anomalies in videos. Existing methods used multiple-instance learning (MIL) to determine the normal/abnormal status of each segment of the video. Recent successful researches argue that it is important to learn the temporal relationships among segments to achieve high accuracy, instead of focusing on only a single segment. Therefore we analyzed the existing methods that have been successful in recent years, and found that while it is indeed important to learn all segments together, the temporal orders among them are irrelevant to achieving high accuracy. Based on this finding, we do not use the MIL framework, but instead propose a lightweight model with a self-attention mechanism to automatically extract features that are important for determining normal/abnormal from all input segments. As a result, our neural network model has 1.3\% of the number of parameters of the existing method. We evaluated the frame-level detection accuracy of our method on three benchmark datasets (UCF-Crime, ShanghaiTech, and XD-Violence) and demonstrate that our method can achieve the comparable or better accuracy than state-of-the-art methods.

下载PDF全文

下载文献需遵守相关版权规定

论文标题