基于深度功能融合技术在视频中检测暴力

论文标题

基于深度功能融合技术在视频中检测暴力

Detecting Violence in Video Based on Deep Features Fusion Technique

论文作者

Jahlan, Heyam M. Bin, Elrefaei, Lamiaa A.

论文摘要

随着许多公共场所监视摄像机的快速增长，例如在购物中心，街道，学校和监狱中，人们对这种系统的需求强烈，以自动检测暴力事件。对检测暴力的视频的au-tomical分析对于执法强度是重要的。此外，它有助于避免任何社会，经济和环境损害。大多数情况下，当今所有系统都要求手动人工主管在视频中脱颖而出的暴力场景，这效率低下且不准确。在这项工作中，我们对涉及两个或更多人的身体暴力感兴趣。这项工作提出了一种新的方法，该方法使用两个显着不同的卷积神经网络（CNN）的融合技术新方法是Alexnet和Squeezenet网络。每个网络紧随其后的是单独的卷积长期内存（Convlstm），以从最终隐藏状态中的视频中提取RO-BUST和更丰富的功能。然后，将这两个获得的状态融合并喂入最大层。最终，使用一系列完全连接的层和软马克斯分类器对功能进行了分类。就检测准确性而言，使用三个标准基准数据集评估了所提出的方法的性能：曲棍球战斗数据集，电影数据集和暴力流数据集。结果的准确性分别为97％，100％和96％。将结果与最先进的技术进行比较，揭示了拟议方法在识别暴力视频方面具有有希望的能力。

With the rapid growth of surveillance cameras in many public places to mon-itor human activities such as in malls, streets, schools and, prisons, there is a strong demand for such systems to detect violence events automatically. Au-tomatic analysis of video to detect violence is significant for law enforce-ment. Moreover, it helps to avoid any social, economic and environmental damages. Mostly, all systems today require manual human supervisors to de-tect violence scenes in the video which is inefficient and inaccurate. in this work, we interest in physical violence that involved two persons or more. This work proposed a novel method to detect violence using a fusion tech-nique of two significantly different convolutional neural networks (CNNs) which are AlexNet and SqueezeNet networks. Each network followed by separate Convolution Long Short Term memory (ConvLSTM) to extract ro-bust and richer features from a video in the final hidden state. Then, making a fusion of these two obtained states and fed to the max-pooling layer. Final-ly, features were classified using a series of fully connected layers and soft-max classifier. The performance of the proposed method is evaluated using three standard benchmark datasets in terms of detection accuracy: Hockey Fight dataset, Movie dataset and Violent Flow dataset. The results show an accuracy of 97%, 100%, and 96% respectively. A comparison of the results with the state of the art techniques revealed the promising capability of the proposed method in recognizing violent videos.

下载PDF全文

下载文献需遵守相关版权规定

论文标题