论文标题
听到流:基于光流的自我监督的视觉声源本地化
Hear The Flow: Optical Flow-Based Self-Supervised Visual Sound Source Localization
论文作者
论文摘要
学习在没有明确注释的情况下将声源定位在视频中是视听研究的新领域。该领域的现有工作着重于创建注意图,以捕获两种方式之间的相关性,以定位声音的来源。在视频中,通常会出现动作的物体。在这项工作中,我们通过将视频中的光流进行建模作为更好的辅助辅助定位声源来捕获这种特征。我们进一步证明,基于流动的注意力大大改善了视觉声源定位。最后,我们将我们的方法基于标准声源本地化数据集,并在SoundNet Flickr和VGG声音源数据集上实现最先进的性能。代码:https://github.com/denfed/heartheflow。
Learning to localize the sound source in videos without explicit annotations is a novel area of audio-visual research. Existing work in this area focuses on creating attention maps to capture the correlation between the two modalities to localize the source of the sound. In a video, oftentimes, the objects exhibiting movement are the ones generating the sound. In this work, we capture this characteristic by modeling the optical flow in a video as a prior to better aid in localizing the sound source. We further demonstrate that the addition of flow-based attention substantially improves visual sound source localization. Finally, we benchmark our method on standard sound source localization datasets and achieve state-of-the-art performance on the Soundnet Flickr and VGG Sound Source datasets. Code: https://github.com/denfed/heartheflow.