在低资源方案中自动从声学示例中自动识别语言家族

论文标题

在低资源方案中自动从声学示例中自动识别语言家族

Automatically Identifying Language Family from Acoustic Examples in Low Resource Scenarios

论文作者

Wu, Peter, Zhong, Yifan, Black, Alan W

论文摘要

现有的多语言语音NLP的工作重点是相对较小的语言子集，因此对语言的当前语言理解主要源于古典方法。在这项工作中，我们提出了一种使用深度学习来分析语言相似性的方法。也就是说，我们在荒野数据集上训练模型，并研究其潜在空间与古典语言家庭发现的比较。我们的方法为任何基于语音的NLP任务中的跨语义数据增强提供了新的方向。

Existing multilingual speech NLP works focus on a relatively small subset of languages, and thus current linguistic understanding of languages predominantly stems from classical approaches. In this work, we propose a method to analyze language similarity using deep learning. Namely, we train a model on the Wilderness dataset and investigate how its latent space compares with classical language family findings. Our approach provides a new direction for cross-lingual data augmentation in any speech-based NLP task.

下载PDF全文

下载文献需遵守相关版权规定

论文标题