以查询为中心的视频摘要的卷积分层注意力网络

论文标题

以查询为中心的视频摘要的卷积分层注意力网络

Convolutional Hierarchical Attention Network for Query-Focused Video Summarization

论文作者

Xiao, Shuwen, Zhao, Zhou, Zhang, Zijian, Yan, Xiaohui, Yang, Min

论文摘要

视频摘要的先前方法主要集中于找到最多样化和代表性的视觉内容作为视频摘要而无需考虑用户的喜好。本文介绍了以查询为重点的视频摘要的任务，该任务将用户的查询和长时间视频作为输入，旨在生成以查询为中心的视频摘要。在本文中，我们将任务视为计算视频镜头和查询之间相似性的问题。为此，我们提出了一种名为卷积分层注意网络（CHAN）的方法，该方法由两个部分组成：特征编码网络和查询 - 相关计算模块。在编码网络中，我们采用了一个具有局部自我发挥机制的卷积网络和查询意识的全球注意机制，以了解每次镜头的视觉信息。编码的功能将发送到查询 - 相关计算模块，以生成查询范围的视频摘要。基准数据集的广泛实验证明了竞争性能并显示了我们方法的有效性。

Previous approaches for video summarization mainly concentrate on finding the most diverse and representative visual contents as video summary without considering the user's preference. This paper addresses the task of query-focused video summarization, which takes user's query and a long video as inputs and aims to generate a query-focused video summary. In this paper, we consider the task as a problem of computing similarity between video shots and query. To this end, we propose a method, named Convolutional Hierarchical Attention Network (CHAN), which consists of two parts: feature encoding network and query-relevance computing module. In the encoding network, we employ a convolutional network with local self-attention mechanism and query-aware global attention mechanism to learns visual information of each shot. The encoded features will be sent to query-relevance computing module to generate queryfocused video summary. Extensive experiments on the benchmark dataset demonstrate the competitive performance and show the effectiveness of our approach.

下载PDF全文

下载文献需遵守相关版权规定

论文标题