论文标题

VovalSound:用于改善人声音识别的数据集

Vocalsound: A Dataset for Improving Human Vocal Sounds Recognition

论文作者

Gong, Yuan, Yu, Jin, Glass, James

论文摘要

识别人类非语音发声是一项重要的任务,并且具有广泛的应用,例如自动音频转录和健康状况监测。但是,现有数据集具有相对少量的声音样本或嘈杂的标签。结果,最先进的音频事件分类模型可能在检测人声音时表现不佳。为了支持建立强大而准确的声音识别的研究,我们创建了一个人声数据集,该数据集由21,000多个众包笑声,叹息,咳嗽,喉咙清理,打喷嚏,打喷嚏以及3,365个独特主题的嗅探组成。实验表明,通过将人声数据集添加到现有数据集中作为培训材料,模型的人声识别性能可以显着提高41.9%。此外,与以前的数据集不同,人声数据集包含元信息,例如说话者年龄,性别,母语,国家和健康状况。

Recognizing human non-speech vocalizations is an important task and has broad applications such as automatic sound transcription and health condition monitoring. However, existing datasets have a relatively small number of vocal sound samples or noisy labels. As a consequence, state-of-the-art audio event classification models may not perform well in detecting human vocal sounds. To support research on building robust and accurate vocal sound recognition, we have created a VocalSound dataset consisting of over 21,000 crowdsourced recordings of laughter, sighs, coughs, throat clearing, sneezes, and sniffs from 3,365 unique subjects. Experiments show that the vocal sound recognition performance of a model can be significantly improved by 41.9% by adding VocalSound dataset to an existing dataset as training material. In addition, different from previous datasets, the VocalSound dataset contains meta information such as speaker age, gender, native language, country, and health condition.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源