文本到语音伪标签的有效性，用于强制对齐和跨语言预审计的模型，以供资源识别低

论文标题

文本到语音伪标签的有效性，用于强制对齐和跨语言预审计的模型，以供资源识别低

Effectiveness of text to speech pseudo labels for forced alignment and cross lingual pretrained models for low resource speech recognition

论文作者

Gupta, Anirudh, Gaur, Rishabh, Dhuriya, Ankur, Chadha, Harveen Singh, Chhimwal, Neeraj, Shah, Priyanshi, Raghavan, Vivek

论文摘要

近年来，鉴于足够的资源，距离（E2E）自动语音识别（ASR）系统已取得了有希望的结果。即使对于没有很多标记数据的语言，也可以通过对大量的高资源语言和低资源语言的芬太尼进行预处理来开发ASR系统的最新状态。对于许多低资源语言，当前的方法仍然具有挑战性，因为在许多情况下，标有数据的数据在开放型域中不可用。在本文中，我们提出了一种方法，通过利用从文本到语音的伪标签来为Maithili，Bhojpuri和Dogri创建标记的数据，以进行强制对齐。检查了创建的数据的质量，然后进一步用于训练基于变压器的WAV2VEC 2.0 ASR模型。所有数据和模型均在开放型域中可用。

In the recent years end to end (E2E) automatic speech recognition (ASR) systems have achieved promising results given sufficient resources. Even for languages where not a lot of labelled data is available, state of the art E2E ASR systems can be developed by pretraining on huge amounts of high resource languages and finetune on low resource languages. For a lot of low resource languages the current approaches are still challenging, since in many cases labelled data is not available in open domain. In this paper we present an approach to create labelled data for Maithili, Bhojpuri and Dogri by utilising pseudo labels from text to speech for forced alignment. The created data was inspected for quality and then further used to train a transformer based wav2vec 2.0 ASR model. All data and models are available in open domain.

下载PDF全文

下载文献需遵守相关版权规定

论文标题