UTMOS：VoiceMos挑战2022的UTOKYO-SARULAB系统

论文标题

UTMOS：VoiceMos挑战2022的UTOKYO-SARULAB系统

UTMOS: UTokyo-SaruLab System for VoiceMOS Challenge 2022

论文作者

Saeki, Takaaki, Xin, Detai, Nakata, Wataru, Koriyama, Tomoki, Takamichi, Shinnosuke, Saruwatari, Hiroshi

论文摘要

我们介绍了提交给VoiceMos Challenge挑战2022的Utokyo-Sarulab平均意见评分（MOS）预测系统。挑战是预测从先前的暴风雪挑战和语音转换挑战中收集的语音样本的MOS值：两种轨道的主要曲目：域中预测的主要曲目和远不止于此的（OOD）（OOD）的轨道（OOD）曲目（OOD）较少来自不同的收听测试。我们的系统基于对强大和弱学习者的合奏学习。强大的学习者结合了以前的自我监督学习（SSL）模型的微调模型，而弱学习者则使用基本的机器学习方法来预测SSL功能的分数。在挑战中，我们的系统在主要和OOD轨道的几个指标上的得分最高。此外，我们进行了消融研究，以研究我们提出的方法的有效性。

We present the UTokyo-SaruLab mean opinion score (MOS) prediction system submitted to VoiceMOS Challenge 2022. The challenge is to predict the MOS values of speech samples collected from previous Blizzard Challenges and Voice Conversion Challenges for two tracks: a main track for in-domain prediction and an out-of-domain (OOD) track for which there is less labeled data from different listening tests. Our system is based on ensemble learning of strong and weak learners. Strong learners incorporate several improvements to the previous fine-tuning models of self-supervised learning (SSL) models, while weak learners use basic machine-learning methods to predict scores from SSL features. In the Challenge, our system had the highest score on several metrics for both the main and OOD tracks. In addition, we conducted ablation studies to investigate the effectiveness of our proposed methods.

下载PDF全文

下载文献需遵守相关版权规定

论文标题