论文标题

Fleurs:很少对语音的普遍表示的学习评估

FLEURS: Few-shot Learning Evaluation of Universal Representations of Speech

论文作者

Conneau, Alexis, Ma, Min, Khanuja, Simran, Zhang, Yu, Axelrod, Vera, Dalmia, Siddharth, Riesa, Jason, Rivera, Clara, Bapna, Ankur

论文摘要

我们介绍了Fleurs,这是对语音基准的普遍表示的少量学习评估。 Fleurs是一种N-Tay Parallel语音数据集,使用102种语言构建在机器翻译顶部的语言Flores-101基准测试中,每个语言大约有12个小时的语音监督。 Fleurs可用于各种语音任务,包括自动语音识别(ASR),语音语言识别(语音langid),翻译和检索。在本文中,我们为基于MSLAM等多语言预训练模型的任务提供基准。 Fleurs的目的是启用更多语言的语音技术,并在低资源的语音理解中催化研究。

We introduce FLEURS, the Few-shot Learning Evaluation of Universal Representations of Speech benchmark. FLEURS is an n-way parallel speech dataset in 102 languages built on top of the machine translation FLoRes-101 benchmark, with approximately 12 hours of speech supervision per language. FLEURS can be used for a variety of speech tasks, including Automatic Speech Recognition (ASR), Speech Language Identification (Speech LangID), Translation and Retrieval. In this paper, we provide baselines for the tasks based on multilingual pre-trained models like mSLAM. The goal of FLEURS is to enable speech technology in more languages and catalyze research in low-resource speech understanding.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源