部分可观测时空混沌系统的无模型预测

论文标题

部分可观测时空混沌系统的无模型预测

CoBERT: Self-Supervised Speech Representation Learning Through Code Representation Learning

论文作者

Meng, Chutong, Ao, Junyi, Ko, Tom, Wang, Mingxuan, Li, Haizhou

论文摘要

语音是有限的语音单元集的表面形式，可以通过离散代码表示。我们提出了代码BERT（COBERT）的方法，以进行自我监督的语音表示学习。这个想法是将话语转换为一系列离散代码，并执行代码表示学习，我们可以根据原始语音输入的掩盖视图来预测代码表示。与先前的自我鉴定方法不同，教师和学生的方式相同，我们的目标模型可以从不同的方式中预测表示形式。科伯特（Cobert）的表现优于ASR任务上最新的最新性能，并在精湛的语音翻译（ST）任务上取得了重大改进。我们的代码和模型在https://github.com/mct10/cobert上发布。

Speech is the surface form of a finite set of phonetic units, which can be represented by discrete codes. We propose the Code BERT (CoBERT) approach for self-supervised speech representation learning. The idea is to convert an utterance to a sequence of discrete codes, and perform code representation learning, where we predict the code representations based on a masked view of the original speech input. Unlike the prior self-distillation approaches of which the teacher and the student are of the same modality, our target model predicts representations from a different modality. CoBERT outperforms the most recent state-of-the-art performance on the ASR task and brings significant improvements on the SUPERB speech translation (ST) task. Our code and models are released at https://github.com/mct10/CoBERT.

下载PDF全文

下载文献需遵守相关版权规定

论文标题