统一的图像和视频显着建模

论文标题

统一的图像和视频显着建模

Unified Image and Video Saliency Modeling

论文作者

Droste, Richard, Jiao, Jianbo, Noble, J. Alison

论文摘要

对于最近的计算机视觉文献中，图像和视频的视觉显着性建模被视为两个独立的任务。虽然图像显着性建模是一个充分研究的问题，并且在Salicon和Mit300等基准上的进展正在放缓，但视频显着性模型已显示出最近的DHF1K基准的快速增长。在这里，我们退后一步，问：可以通过统一模型与具有相互利益的统一模型进行图像和视频显着性建模？我们确定图像和视频显着性数据之间以及不同视频显着性数据集之间的域转移来源的不同来源，这是有效的关节建模的关键挑战。为了解决这个问题，我们提出了四种新型的域适应技术 - 领域自适应先验，自适应融合，域自适应平滑和旁路-RNN-除了改进了学识渊博的高斯先生的配方。我们将这些技术集成到一个简单且轻巧的编码器-RNN型网络（UNISAL）中，并通过图像和视频显着性数据共同训练它。我们在视频显着数据集DHF1K，Hollywood-2和UCF-Sports以及图像显着性数据集Salicon和Mit300上评估了我们的方法。有了一组参数，Unisal在所有视频显着性数据集中都可以实现最先进的性能，并且与所有竞争性深度方法相比，尽管运行速度更快，并且型号较小，但与图像显着性数据集的最新型号相提并论。我们提供回顾性分析和消融研究，以证实域转移建模的重要性。该代码可从https://github.com/rdroste/unisal获得

Visual saliency modeling for images and videos is treated as two independent tasks in recent computer vision literature. While image saliency modeling is a well-studied problem and progress on benchmarks like SALICON and MIT300 is slowing, video saliency models have shown rapid gains on the recent DHF1K benchmark. Here, we take a step back and ask: Can image and video saliency modeling be approached via a unified model, with mutual benefit? We identify different sources of domain shift between image and video saliency data and between different video saliency datasets as a key challenge for effective joint modelling. To address this we propose four novel domain adaptation techniques - Domain-Adaptive Priors, Domain-Adaptive Fusion, Domain-Adaptive Smoothing and Bypass-RNN - in addition to an improved formulation of learned Gaussian priors. We integrate these techniques into a simple and lightweight encoder-RNN-decoder-style network, UNISAL, and train it jointly with image and video saliency data. We evaluate our method on the video saliency datasets DHF1K, Hollywood-2 and UCF-Sports, and the image saliency datasets SALICON and MIT300. With one set of parameters, UNISAL achieves state-of-the-art performance on all video saliency datasets and is on par with the state-of-the-art for image saliency datasets, despite faster runtime and a 5 to 20-fold smaller model size compared to all competing deep methods. We provide retrospective analyses and ablation studies which confirm the importance of the domain shift modeling. The code is available at https://github.com/rdroste/unisal

下载PDF全文

下载文献需遵守相关版权规定

论文标题