唱歌 - tacotron：全球持续时间控制注意力和端到端歌声综合的动态过滤器

论文标题

唱歌 - tacotron：全球持续时间控制注意力和端到端歌声综合的动态过滤器

Singing-Tacotron: Global duration control attention and dynamic filter for End-to-end singing voice synthesis

论文作者

Wang, Tao, Fu, Ruibo, Yi, Jiangyan, Tao, Jianhua, Wen, Zhengqi

论文摘要

由于避免了预一致的数据，端到端的唱歌语音合成（SVS）很有吸引力。但是，自动学习的声音与歌词的一致性很难与乐谱中的持续时间信息相匹配，这将导致模型不稳定性甚至无法综合声音。为了自动学习准确的对齐信息，本文提出了一个名为Singing-TaCotron的端到端SVS框架。提出的框架和TACOTRON之间的主要区别在于，可以通过乐谱的持续时间信息来显着控制语音。首先，我们为SVS模型提出了一种全球持续时间控制注意机制。注意机制可以控制每个音素的持续时间。其次，提议将持续时间编码器从乐谱中学习一组全球过渡令牌。这些过渡令牌可以帮助注意力机制决定转移到下一个音素还是在每个解码步骤中停留。第三，为了进一步提高模型的稳定性，动态过滤器旨在帮助模型克服噪声干扰并更加关注本地上下文信息。主观和客观评估验证该方法的有效性。此外，还探讨了全球过渡令牌的作用和持续时间控制的效果。可以在https://hairuo55.github.io/singingtacotron上找到实验的示例。

End-to-end singing voice synthesis (SVS) is attractive due to the avoidance of pre-aligned data. However, the auto learned alignment of singing voice with lyrics is difficult to match the duration information in musical score, which will lead to the model instability or even failure to synthesize voice. To learn accurate alignment information automatically, this paper proposes an end-to-end SVS framework, named Singing-Tacotron. The main difference between the proposed framework and Tacotron is that the speech can be controlled significantly by the musical score's duration information. Firstly, we propose a global duration control attention mechanism for the SVS model. The attention mechanism can control each phoneme's duration. Secondly, a duration encoder is proposed to learn a set of global transition tokens from the musical score. These transition tokens can help the attention mechanism decide whether moving to the next phoneme or staying at each decoding step. Thirdly, to further improve the model's stability, a dynamic filter is designed to help the model overcome noise interference and pay more attention to local context information. Subjective and objective evaluation verify the effectiveness of the method. Furthermore, the role of global transition tokens and the effect of duration control are explored. Examples of experiments can be found at https://hairuo55.github.io/SingingTacotron.

下载PDF全文

下载文献需遵守相关版权规定

论文标题