欧元：ESPNET无监督的ASR开源工具包

论文标题

欧元：ESPNET无监督的ASR开源工具包

EURO: ESPnet Unsupervised ASR Open-source Toolkit

论文作者

Gao, Dongji, Shi, Jiatong, Chuang, Shun-Po, Garcia, Leibny Paola, Lee, Hung-yi, Watanabe, Shinji, Khudanpur, Sanjeev

论文摘要

本文介绍了ESPNET无监督的ASR开源工具包（EURO），这是一种无监督自动语音识别（UASR）的端到端开源工具包。欧元采用了最初在Fairseq实施的WAV2VEC-U引入的最先进的UASR学习方法，该方法利用自我监督的语音表示和对抗性培训。除了WAV2VEC2外，Euro还通过集成S3PRL和K2来扩展UASR任务的功能并促进可重复性，从而从27个自我监督模型和各种基于图的解码策略中获得了灵活的前端。 EURO在ESPNET中实施，并遵循其统一管道，为UASR配方提供完整的设置。这提高了管道的效率，并允许将欧元轻松应用于ESPNET的现有数据集。在三个主流自我监督模型上进行了广泛的实验，证明了该工具包的有效性，并在TIMIT和LibrisPeech数据集上实现了最先进的UASR性能。 EURO将在https://github.com/espnet/espnet上公开获取，旨在通过开源活动来促进基于UASR的令人兴奋和新兴的研究领域。

This paper describes the ESPnet Unsupervised ASR Open-source Toolkit (EURO), an end-to-end open-source toolkit for unsupervised automatic speech recognition (UASR). EURO adopts the state-of-the-art UASR learning method introduced by the Wav2vec-U, originally implemented at FAIRSEQ, which leverages self-supervised speech representations and adversarial training. In addition to wav2vec2, EURO extends the functionality and promotes reproducibility for UASR tasks by integrating S3PRL and k2, resulting in flexible frontends from 27 self-supervised models and various graph-based decoding strategies. EURO is implemented in ESPnet and follows its unified pipeline to provide UASR recipes with a complete setup. This improves the pipeline's efficiency and allows EURO to be easily applied to existing datasets in ESPnet. Extensive experiments on three mainstream self-supervised models demonstrate the toolkit's effectiveness and achieve state-of-the-art UASR performance on TIMIT and LibriSpeech datasets. EURO will be publicly available at https://github.com/espnet/espnet, aiming to promote this exciting and emerging research area based on UASR through open-source activity.

下载PDF全文

下载文献需遵守相关版权规定

论文标题