使用半监督的硬注意模型的视频暴力识别和本地化

论文标题

使用半监督的硬注意模型的视频暴力识别和本地化

Video Violence Recognition and Localization Using a Semi-Supervised Hard Attention Model

论文作者

Mohammadi, Hamid, Nazerfard, Ehsan

论文摘要

监视摄像机网络的显着增长需要可扩展的AI解决方案，以有效地分析这些网络产生的大量视频数据。作为在监视录像上进行的典型分析，视频暴力检测最近受到了广泛的关注。大多数研究的重点是使用监督方法改善现有方法，几乎没有（如果有）注意半监督的学习方法。在这项研究中，引入了增强学习模型，该模型可以通过半监督的方法胜过现有模型。该方法的主要新颖性在于引入半监督的硬注意机制。使用强烈的关注，视频的基本区域被鉴定出来并与数据的非信息部分分开。通过删除冗余数据并专注于更高分辨率的有用视觉信息，可以提高模型的精度。使用半监督的强化学习算法实施硬注意机制可以消除视频暴力数据集中注意注释的需求，从而使其易于适用。提出的模型利用预先训练的I3D主链来加速和稳定训练过程。提议的模型在RWF和曲棍球数据集上的最新精度分别达到90.4％和98.7％。

The significant growth of surveillance camera networks necessitates scalable AI solutions to efficiently analyze the large amount of video data produced by these networks. As a typical analysis performed on surveillance footage, video violence detection has recently received considerable attention. The majority of research has focused on improving existing methods using supervised methods, with little, if any, attention to the semi-supervised learning approaches. In this study, a reinforcement learning model is introduced that can outperform existing models through a semi-supervised approach. The main novelty of the proposed method lies in the introduction of a semi-supervised hard attention mechanism. Using hard attention, the essential regions of videos are identified and separated from the non-informative parts of the data. A model's accuracy is improved by removing redundant data and focusing on useful visual information in a higher resolution. Implementing hard attention mechanisms using semi-supervised reinforcement learning algorithms eliminates the need for attention annotations in video violence datasets, thus making them readily applicable. The proposed model utilizes a pre-trained I3D backbone to accelerate and stabilize the training process. The proposed model achieved state-of-the-art accuracy of 90.4% and 98.7% on RWF and Hockey datasets, respectively.

下载PDF全文

下载文献需遵守相关版权规定

论文标题