论文标题
基于深度功能融合技术在视频中检测暴力
Detecting Violence in Video Based on Deep Features Fusion Technique
论文作者
论文摘要
随着许多公共场所监视摄像机的快速增长,例如在购物中心,街道,学校和监狱中,人们对这种系统的需求强烈,以自动检测暴力事件。对检测暴力的视频的au-tomical分析对于执法强度是重要的。此外,它有助于避免任何社会,经济和环境损害。大多数情况下,当今所有系统都要求手动人工主管在视频中脱颖而出的暴力场景,这效率低下且不准确。在这项工作中,我们对涉及两个或更多人的身体暴力感兴趣。这项工作提出了一种新的方法,该方法使用两个显着不同的卷积神经网络(CNN)的融合技术新方法是Alexnet和Squeezenet网络。每个网络紧随其后的是单独的卷积长期内存(Convlstm),以从最终隐藏状态中的视频中提取RO-BUST和更丰富的功能。然后,将这两个获得的状态融合并喂入最大层。最终,使用一系列完全连接的层和软马克斯分类器对功能进行了分类。就检测准确性而言,使用三个标准基准数据集评估了所提出的方法的性能:曲棍球战斗数据集,电影数据集和暴力流数据集。结果的准确性分别为97%,100%和96%。将结果与最先进的技术进行比较,揭示了拟议方法在识别暴力视频方面具有有希望的能力。
With the rapid growth of surveillance cameras in many public places to mon-itor human activities such as in malls, streets, schools and, prisons, there is a strong demand for such systems to detect violence events automatically. Au-tomatic analysis of video to detect violence is significant for law enforce-ment. Moreover, it helps to avoid any social, economic and environmental damages. Mostly, all systems today require manual human supervisors to de-tect violence scenes in the video which is inefficient and inaccurate. in this work, we interest in physical violence that involved two persons or more. This work proposed a novel method to detect violence using a fusion tech-nique of two significantly different convolutional neural networks (CNNs) which are AlexNet and SqueezeNet networks. Each network followed by separate Convolution Long Short Term memory (ConvLSTM) to extract ro-bust and richer features from a video in the final hidden state. Then, making a fusion of these two obtained states and fed to the max-pooling layer. Final-ly, features were classified using a series of fully connected layers and soft-max classifier. The performance of the proposed method is evaluated using three standard benchmark datasets in terms of detection accuracy: Hockey Fight dataset, Movie dataset and Violent Flow dataset. The results show an accuracy of 97%, 100%, and 96% respectively. A comparison of the results with the state of the art techniques revealed the promising capability of the proposed method in recognizing violent videos.