Stavis：时空视听显着性网络

论文标题

Stavis：时空视听显着性网络

STAViS: Spatio-Temporal AudioVisual Saliency Network

论文作者

Tsiami, Antigoni, Koutras, Petros, Maragos, Petros

论文摘要

我们介绍了Stavis，这是一个时空的视听显着性网络，结合了时空的视觉和听觉信息，以有效地解决视频中显着性估计的问题。我们的方法采用了一个结合视觉显着性和听觉功能的单个网络，并学会了适当本地化的声音来源并融合两个Saliencies，以获得最终的显着性图。该网络是在六个不同的数据库上设计，训练有素的，并在包含各种视频的视听数据库上进行了评估。我们将方法与8种不同的最先进的视觉显着性模型进行了比较。跨数据库的评估结果表明，在大多数情况下，我们的Stavis模型优于我们的仅视觉变体以及其他最新模型。同样，它在所有数据库中取得的始终如一的性能始终表明，它适合估计“野外”的显着性。该代码可在https://github.com/atsiami/stavis上找到。

We introduce STAViS, a spatio-temporal audiovisual saliency network that combines spatio-temporal visual and auditory information in order to efficiently address the problem of saliency estimation in videos. Our approach employs a single network that combines visual saliency and auditory features and learns to appropriately localize sound sources and to fuse the two saliencies in order to obtain a final saliency map. The network has been designed, trained end-to-end, and evaluated on six different databases that contain audiovisual eye-tracking data of a large variety of videos. We compare our method against 8 different state-of-the-art visual saliency models. Evaluation results across databases indicate that our STAViS model outperforms our visual only variant as well as the other state-of-the-art models in the majority of cases. Also, the consistently good performance it achieves for all databases indicates that it is appropriate for estimating saliency "in-the-wild". The code is available at https://github.com/atsiami/STAViS.

下载PDF全文

下载文献需遵守相关版权规定

论文标题