关于与韵律相关的任务的自我监督模型的实用性

论文标题

关于与韵律相关的任务的自我监督模型的实用性

On the Utility of Self-supervised Models for Prosody-related Tasks

论文作者

Lin, Guan-Ting, Feng, Chi-Luen, Huang, Wei-Ping, Tseng, Yuan, Lin, Tzu-Han, Li, Chen-An, Lee, Hung-yi, Ward, Nigel G.

论文摘要

来自语音数据的自我监督学习（SSL）产生了在许多任务中取得了出色表现的模型，并且已知这些模型隐含地代表了语音信号中潜在的信息的许多方面。但是，对于此类模型对于韵律相关的任务或它们编码韵律信息的程度的适用性相对较少。我们提出了一个新的评估框架，即Superb-prosody，由三个与韵律有关的下游任务和两个伪任务组成。我们发现，15个SSL模型中的13个模型在所有与韵律相关的任务上的表现都优于基线。我们还在两个伪任务上表现出良好的表现：韵律重建和未来的韵律预测。我们进一步分析了SSL模型的图层贡献。总体而言，我们得出的结论是，SSL语音模型对于与疾病相关的任务非常有效。

Self-Supervised Learning (SSL) from speech data has produced models that have achieved remarkable performance in many tasks, and that are known to implicitly represent many aspects of information latently present in speech signals. However, relatively little is known about the suitability of such models for prosody-related tasks or the extent to which they encode prosodic information. We present a new evaluation framework, SUPERB-prosody, consisting of three prosody-related downstream tasks and two pseudo tasks. We find that 13 of the 15 SSL models outperformed the baseline on all the prosody-related tasks. We also show good performance on two pseudo tasks: prosody reconstruction and future prosody prediction. We further analyze the layerwise contributions of the SSL models. Overall we conclude that SSL speech models are highly effective for prosody-related tasks.

下载PDF全文

下载文献需遵守相关版权规定

论文标题