论文标题
迷你网络:视频的多个实例排名网络突出显示检测
MINI-Net: Multiple Instance Ranking Network for Video Highlight Detection
论文作者
论文摘要
我们讨论了弱监督的视频突出显示检测问题,用于学习检测鉴于他们的视频活动标签,但没有手动注释突出显示段的较高的培训视频中更具吸引力的细分。尽管手动避免本地化重点细分市场,但弱监督的建模却是一项挑战,因为我们日常生活中的视频可能包含具有多种事件类型的突出显示细分市场,例如滑雪和冲浪。在这项工作中,我们建议针对给定特定事件作为多个实例排名网络(MINI-NET)学习的弱监督视频突出显示检测建模。我们将每个视频视为一袋细分市场,因此,所提出的迷你网络学会了为一个正面袋子的较高的重点分数(包含特定事件的突出显示部分),而不是无关紧要的负面袋。特别是,我们形成最大最大排名损失,以获得最可能的正段实例与最难的负段实例之间的可靠相对比较。由于这种最大最大排名损失,我们的迷你网络有效地利用了所有细分信息来获取更独特的视频功能表示形式,以在视频中定位特定事件的亮点段。关于三个具有挑战性的公共基准的广泛实验结果显然验证了我们多个实例排名方法解决问题的功效。
We address the weakly supervised video highlight detection problem for learning to detect segments that are more attractive in training videos given their video event label but without expensive supervision of manually annotating highlight segments. While manually averting localizing highlight segments, weakly supervised modeling is challenging, as a video in our daily life could contain highlight segments with multiple event types, e.g., skiing and surfing. In this work, we propose casting weakly supervised video highlight detection modeling for a given specific event as a multiple instance ranking network (MINI-Net) learning. We consider each video as a bag of segments, and therefore, the proposed MINI-Net learns to enforce a higher highlight score for a positive bag that contains highlight segments of a specific event than those for negative bags that are irrelevant. In particular, we form a max-max ranking loss to acquire a reliable relative comparison between the most likely positive segment instance and the hardest negative segment instance. With this max-max ranking loss, our MINI-Net effectively leverages all segment information to acquire a more distinct video feature representation for localizing the highlight segments of a specific event in a video. The extensive experimental results on three challenging public benchmarks clearly validate the efficacy of our multiple instance ranking approach for solving the problem.