论文标题
多式联运的城市声音标签与时空上下文
Multimodal Urban Sound Tagging with Spatiotemporal Context
论文作者
论文摘要
噪声污染显着影响我们的日常生活和城市发展。 Urban Sound Taging(UST)最近引起了很多关注,该标签旨在分析和监测城市噪声污染。先前的UST研究的一个弱点是,尚未研究声音信号的空间和时间上下文,其中包含有关记录音频数据何时何地的互补信息。为了解决这个问题,在本文中,我们提出了一个多模式的UST系统,该系统将音频和时空上下文深入挖掘。为了结合不同声学特征的特征,首先将两组四个光谱图作为残留神经网络的输入提取。然后,时空上下文编码并与声学特征相结合,以探索多模式学习的效率以区分声音信号。此外,文本处理中采用了一种数据过滤方法,以进一步提高多模式的性能。我们在DCASE2020的UST挑战(任务5)上评估了提出的方法。实验结果证明了该方法的有效性。
Noise pollution significantly affects our daily life and urban development. Urban Sound Tagging (UST) has attracted much attention recently, which aims to analyze and monitor urban noise pollution. One weakness of the previous UST studies is that the spatial and temporal context of sound signals, which contains complementary information about when and where the audio data was recorded, has not been investigated. To address this problem, in this paper, we propose a multimodal UST system that deeply mines the audio and spatiotemporal context together. In order to incorporate characteristics of different acoustic features, two sets of four spectrograms are first extracted as the inputs of residual neural networks. Then, the spatiotemporal context is encoded and combined with acoustic features to explore the efficiency of multimodal learning for discriminating sound signals. Moreover, a data filtering approach is adopted in text processing to further improve the performance of multi-modality. We evaluate the proposed method on the UST challenge (task 5) of DCASE2020. Experimental results demonstrate the effectiveness of the proposed method.