论文标题
UserLibri:仅使用文本的ASR个性化数据集
UserLibri: A Dataset for ASR Personalization Using Only Text
论文作者
论文摘要
移动设备上语音模型的个性化(在设备个性化的个性化)上是一个活跃的研究领域,但是更多的是,移动设备比配对的音频文本数据具有更多的仅文本数据。我们探索培训一种针对仅文本数据的个性化语言模型,该模型在推断期间用于提高该用户的语音识别性能。我们在一个用户群体的Librispeech语料库上进行了实验,并为Project Gutenberg的每个用户提供了个性化的文本数据。我们发布此特定于用户的LibrisPeech(UserLibri)数据集,以帮助未来的个性化研究。 LibrisPeech音频转录对分为来自测试清洁数据集的55个用户,而Test-other的52位用户分组为52个用户。我们能够在流媒体和非启动模型的两个集合中降低每个用户的平均单词错误率,包括在流式传输时对较难的测试用户组的改进。
Personalization of speech models on mobile devices (on-device personalization) is an active area of research, but more often than not, mobile devices have more text-only data than paired audio-text data. We explore training a personalized language model on text-only data, used during inference to improve speech recognition performance for that user. We experiment on a user-clustered LibriSpeech corpus, supplemented with personalized text-only data for each user from Project Gutenberg. We release this User-Specific LibriSpeech (UserLibri) dataset to aid future personalization research. LibriSpeech audio-transcript pairs are grouped into 55 users from the test-clean dataset and 52 users from test-other. We are able to lower the average word error rate per user across both sets in streaming and nonstreaming models, including an improvement of 2.5 for the harder set of test-other users when streaming.