Araus：一个大规模的数据集和基线模型，是对增强城市音景的情感响应

论文标题

Araus：一个大规模的数据集和基线模型，是对增强城市音景的情感响应

ARAUS: A Large-Scale Dataset and Baseline Models of Affective Responses to Augmented Urban Soundscapes

论文作者

Ooi, Kenneth, Ong, Zhen-Ting, Watcharasupat, Karn N., Lam, Bhan, Hong, Joo Young, Gan, Woon-Seng

论文摘要

选择最佳的掩蔽器以使现有的音景通过音景增强实现所需的感知变化，这是由于种类繁多的种类，并且缺乏基准数据集，可以与之比较和开发声景增强模型。为了解决这个问题，我们使ARAUS（对增强城市音景的情感响应）数据集公开可用，该数据集包括五倍的交叉验证集和独立的测试集，总计25,440个独特的主观感知响应，对增强的音景表现为Audio Vis-Visual visaual刺激。每种增强的音景都是通过以数字方式添加“蒙版”（鸟，水，风，交通，建筑或沉默）的，以固定的音景遮罩比率以城市音景录制。然后，通过要求参与者对ISO 12913-2：2018的规定，要求参与者评估如何评价参与者对每种增强音景的愉悦，烦人，烦人，充满活力，单调，混乱，平静和适当的每种增强音景的程度。参与者还提供了相关的人口统计信息并完成了标准心理问卷。我们对获得的响应进行探索和统计分析，以验证内部一致性并与文献中的已知结果一致。最后，我们通过训练并比较了城市音景愉悦度的四个基线模型来证明数据集的基准能力：低参数回归模型，高参数卷积神经网络以及文献中的两个基于注意力的网络。

Choosing optimal maskers for existing soundscapes to effect a desired perceptual change via soundscape augmentation is non-trivial due to extensive varieties of maskers and a dearth of benchmark datasets with which to compare and develop soundscape augmentation models. To address this problem, we make publicly available the ARAUS (Affective Responses to Augmented Urban Soundscapes) dataset, which comprises a five-fold cross-validation set and independent test set totaling 25,440 unique subjective perceptual responses to augmented soundscapes presented as audio-visual stimuli. Each augmented soundscape is made by digitally adding "maskers" (bird, water, wind, traffic, construction, or silence) to urban soundscape recordings at fixed soundscape-to-masker ratios. Responses were then collected by asking participants to rate how pleasant, annoying, eventful, uneventful, vibrant, monotonous, chaotic, calm, and appropriate each augmented soundscape was, in accordance with ISO 12913-2:2018. Participants also provided relevant demographic information and completed standard psychological questionnaires. We perform exploratory and statistical analysis of the responses obtained to verify internal consistency and agreement with known results in the literature. Finally, we demonstrate the benchmarking capability of the dataset by training and comparing four baseline models for urban soundscape pleasantness: a low-parameter regression model, a high-parameter convolutional neural network, and two attention-based networks in the literature.

下载PDF全文

下载文献需遵守相关版权规定

论文标题