歌声：自动曲线和对齐限制的自动歌曲写作

论文标题

歌声：自动曲线和对齐限制的自动歌曲写作

SongMASS: Automatic Song Writing with Pre-training and Alignment Constraint

论文作者

Sheng, Zhonghao, Song, Kaitao, Tan, Xu, Ren, Yi, Ye, Wei, Zhang, Shikun, Qin, Tao

论文摘要

自动歌曲写作旨在用Machine撰写一首歌（歌词和/或旋律），这在学术界和行业中都是一个有趣的话题。在自动歌曲创作中，歌词到融合的生成和旋律到乳液是两个重要的任务，这两个任务通常都面临以下挑战：1）配对的歌词和旋律数据有限，这会影响两个任务的生成质量，考虑到许多配对训练数据是由于抒情和旋律之间的弱相关性而需要的； 2）在抒情和旋律之间需要严格的对齐，这依赖于特定的比对建模。在本文中，我们提出了Songmass来应对上述挑战，该挑战利用掩盖序列为抒情液到循环的序列（质量）预训练和基于注意力的对准建模，以及旋律到循环发电。具体来说，1）我们将原始的句子级质量预训练扩展到歌曲级别，以更好地捕获音乐中的长上下文信息，并为每种模态使用单独的编码器和解码器（歌词或旋律）； 2）我们利用训练期间的句子级别的注意力掩码和令牌级别的注意力限制，以增强歌词和旋律之间的一致性。在推断期间，我们使用动态编程策略来获得歌词中每个单词/音节之间的比对，而旋律中的注释。我们在未配合的歌词和旋律数据集上预先训练歌曲量，客观和主观评估都表明，Songmass在没有预训练或对齐约束的情况下，具有比基线方法的抒情和旋律明显更好。

Automatic song writing aims to compose a song (lyric and/or melody) by machine, which is an interesting topic in both academia and industry. In automatic song writing, lyric-to-melody generation and melody-to-lyric generation are two important tasks, both of which usually suffer from the following challenges: 1) the paired lyric and melody data are limited, which affects the generation quality of the two tasks, considering a lot of paired training data are needed due to the weak correlation between lyric and melody; 2) Strict alignments are required between lyric and melody, which relies on specific alignment modeling. In this paper, we propose SongMASS to address the above challenges, which leverages masked sequence to sequence (MASS) pre-training and attention based alignment modeling for lyric-to-melody and melody-to-lyric generation. Specifically, 1) we extend the original sentence-level MASS pre-training to song level to better capture long contextual information in music, and use a separate encoder and decoder for each modality (lyric or melody); 2) we leverage sentence-level attention mask and token-level attention constraint during training to enhance the alignment between lyric and melody. During inference, we use a dynamic programming strategy to obtain the alignment between each word/syllable in lyric and note in melody. We pre-train SongMASS on unpaired lyric and melody datasets, and both objective and subjective evaluations demonstrate that SongMASS generates lyric and melody with significantly better quality than the baseline method without pre-training or alignment constraint.

下载PDF全文

下载文献需遵守相关版权规定

论文标题