深度无监督的关键框架提取以进行有效的视频分类

论文标题

深度无监督的关键框架提取以进行有效的视频分类

Deep Unsupervised Key Frame Extraction for Efficient Video Classification

论文作者

Tang, Hao, Ding, Lei, Wu, Songsong, Ren, Bin, Sebe, Nicu, Rota, Paolo

论文摘要

视频处理和分析已成为一项紧迫的任务，因为每天在线上上传大量视频（例如，YouTube，Hulu）。从视频中提取代表性的关键帧在视频处理和分析中非常重要，因为它大大减少了计算资源和时间。尽管最近取得了巨大进展，但大规模的视频分类仍然是一个空旷的问题，因为现有方法并不能同时平衡性能和效率。为了解决此问题，这项工作提出了一种无监督的方法来检索关键帧，该方法结合了卷积神经网络（CNN）和时间段密度峰值聚类（TSDPC）。提出的TSDPC是一个通用且功能强大的框架，与以前的作品相比具有两个优点，其中一个是它可以自动计算关键帧的数量。另一个是它可以保留视频的时间信息。因此，它提高了视频分类的效率。此外，在CNN的顶部添加了长期短期内存网络（LSTM），以进一步提高分类的性能。此外，提出了不同输入网络的权重融合策略，以提高性能。通过同时优化视频分类和关键框架提取，我们实现了更好的分类性能和更高的效率。我们在两个流行的数据集（即HMDB51和UCF101）上评估我们的方法，实验结果始终证明，与最先进的方法相比，我们的策略可以实现竞争性能和效率。

Video processing and analysis have become an urgent task since a huge amount of videos (e.g., Youtube, Hulu) are uploaded online every day. The extraction of representative key frames from videos is very important in video processing and analysis since it greatly reduces computing resources and time. Although great progress has been made recently, large-scale video classification remains an open problem, as the existing methods have not well balanced the performance and efficiency simultaneously. To tackle this problem, this work presents an unsupervised method to retrieve the key frames, which combines Convolutional Neural Network (CNN) and Temporal Segment Density Peaks Clustering (TSDPC). The proposed TSDPC is a generic and powerful framework and it has two advantages compared with previous works, one is that it can calculate the number of key frames automatically. The other is that it can preserve the temporal information of the video. Thus it improves the efficiency of video classification. Furthermore, a Long Short-Term Memory network (LSTM) is added on the top of the CNN to further elevate the performance of classification. Moreover, a weight fusion strategy of different input networks is presented to boost the performance. By optimizing both video classification and key frame extraction simultaneously, we achieve better classification performance and higher efficiency. We evaluate our method on two popular datasets (i.e., HMDB51 and UCF101) and the experimental results consistently demonstrate that our strategy achieves competitive performance and efficiency compared with the state-of-the-art approaches.

下载PDF全文

下载文献需遵守相关版权规定

论文标题