Aves：基于自学的动物发声编码器

论文标题

Aves：基于自学的动物发声编码器

AVES: Animal Vocalization Encoder based on Self-Supervision

论文作者

Hagiwara, Masato

论文摘要

在生物源中缺乏注释的培训数据阻碍了以监督方式训练的大规模神经网络模型。为了利用大量未经注释的音频数据，我们提出了Aves（基于自学的动物发声编码器），这是一种自我监管，基于变压器的音频表示模型，用于编码动物发声。我们在一组未注释的音频数据集上预计，并将它们调整为下游的生物声学任务。具有一套分类和检测任务的全面实验表明，AVES的表现优于所有强大的基线，甚至超过了在带注释的音频分类数据集中训练的监督“ Topline”模型。结果还表明，策划与下游任务相关的小型培训子集是一种训练高质量音频表示模型的有效方法。我们通过\ url {https://github.com/earthspecies/aves}开放模型。

The lack of annotated training data in bioacoustics hinders the use of large-scale neural network models trained in a supervised way. In order to leverage a large amount of unannotated audio data, we propose AVES (Animal Vocalization Encoder based on Self-Supervision), a self-supervised, transformer-based audio representation model for encoding animal vocalizations. We pretrain AVES on a diverse set of unannotated audio datasets and fine-tune them for downstream bioacoustics tasks. Comprehensive experiments with a suite of classification and detection tasks have shown that AVES outperforms all the strong baselines and even the supervised "topline" models trained on annotated audio classification datasets. The results also suggest that curating a small training subset related to downstream tasks is an efficient way to train high-quality audio representation models. We open-source our models at \url{https://github.com/earthspecies/aves}.

下载PDF全文

下载文献需遵守相关版权规定

论文标题